2025-05-07T19:42:32.7756912Z Current runner version: '2.323.0' 2025-05-07T19:42:32.7762812Z Runner name: 'i-0009551202268beab' 2025-05-07T19:42:32.7763818Z Machine name: 'ip-10-0-19-107' 2025-05-07T19:42:32.7766502Z ##[group]GITHUB_TOKEN Permissions 2025-05-07T19:42:32.7769044Z Contents: read 2025-05-07T19:42:32.7769792Z Metadata: read 2025-05-07T19:42:32.7770346Z Packages: read 2025-05-07T19:42:32.7771016Z ##[endgroup] 2025-05-07T19:42:32.7773515Z Secret source: None 2025-05-07T19:42:32.7774496Z Prepare workflow directory 2025-05-07T19:42:32.8430220Z Prepare all required actions 2025-05-07T19:42:32.8468909Z Getting action download info 2025-05-07T19:42:33.0227199Z Download action repository 'actions/checkout@v4' (SHA:11bd71901bbe5b1630ceea73d27597364c9af683) 2025-05-07T19:42:33.3008448Z Download action repository 'actions/upload-artifact@v4' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02) 2025-05-07T19:42:33.8772681Z Complete job name: build_artifact (x86, linux.24xlarge, genai, 3.9, 12.8.0, clang) 2025-05-07T19:42:33.9683808Z A job started hook has been configured by the self-hosted runner administrator 2025-05-07T19:42:33.9823439Z ##[group]Run '/home/ec2-user/runner-scripts/before_job.sh' 2025-05-07T19:42:33.9834093Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:42:33.9835595Z ##[endgroup] 2025-05-07T19:42:35.1320367Z Runner Type: linux.24xlarge 2025-05-07T19:42:35.1320892Z Instance Type: c5.24xlarge 2025-05-07T19:42:35.1321232Z AMI Name: unknown 2025-05-07T19:42:35.1353447Z AMI ID: ami-071226ecf16aa7d96 2025-05-07T19:42:40.1750204Z ##[group]Checking docker version 2025-05-07T19:42:40.1763003Z ##[command]/usr/bin/docker version --format '{{.Server.APIVersion}}' 2025-05-07T19:42:40.1968204Z '1.44' 2025-05-07T19:42:40.1984902Z Docker daemon API version: '1.44' 2025-05-07T19:42:40.1985460Z ##[command]/usr/bin/docker version --format '{{.Client.APIVersion}}' 2025-05-07T19:42:40.2181752Z '1.44' 2025-05-07T19:42:40.2192909Z Docker client API version: '1.44' 2025-05-07T19:42:40.2197489Z ##[endgroup] 2025-05-07T19:42:40.2199988Z ##[group]Clean up resources from previous jobs 2025-05-07T19:42:40.2204769Z ##[command]/usr/bin/docker ps --all --quiet --no-trunc --filter "label=38d49b" 2025-05-07T19:42:40.2350291Z ##[command]/usr/bin/docker network prune --force --filter "label=38d49b" 2025-05-07T19:42:40.2494841Z ##[endgroup] 2025-05-07T19:42:40.2495280Z ##[group]Create local container network 2025-05-07T19:42:40.2506457Z ##[command]/usr/bin/docker network create --label 38d49b github_network_8b9c024716574df3abf522ec9f61fb6d 2025-05-07T19:42:40.4893757Z 05c86691a0cd4bad0680596a372ca49fdb6f6dbfff333a6ba5189cae791419e4 2025-05-07T19:42:40.4915626Z ##[endgroup] 2025-05-07T19:42:40.4946402Z ##[group]Starting job container 2025-05-07T19:42:40.4966165Z ##[command]/usr/bin/docker pull amazonlinux:2023 2025-05-07T19:42:40.6129315Z 2023: Pulling from library/amazonlinux 2025-05-07T19:42:40.6227834Z Digest: sha256:cb5b4c509d62ae388f674c139ae5e8281fc160c217d474445e912043e1941988 2025-05-07T19:42:40.6229343Z Status: Image is up to date for amazonlinux:2023 2025-05-07T19:42:40.6252109Z docker.io/library/amazonlinux:2023 2025-05-07T19:42:40.6342223Z ##[command]/usr/bin/docker create --name d27e40242d2e486782d6f283e7ffb112_amazonlinux2023_99beec --label 38d49b --workdir /__w/FBGEMM/FBGEMM --network github_network_8b9c024716574df3abf522ec9f61fb6d --user root -e "HOME=/github/home" -e GITHUB_ACTIONS=true -e CI=true -v "/var/run/docker.sock":"/var/run/docker.sock" -v "/home/ec2-user/actions-runner/_work":"/__w" -v "/home/ec2-user/actions-runner/externals":"/__e":ro -v "/home/ec2-user/actions-runner/_work/_temp":"/__w/_temp" -v "/home/ec2-user/actions-runner/_work/_actions":"/__w/_actions" -v "/home/ec2-user/actions-runner/_work/_tool":"/__w/_tool" -v "/home/ec2-user/actions-runner/_work/_temp/_github_home":"/github/home" -v "/home/ec2-user/actions-runner/_work/_temp/_github_workflow":"/github/workflow" --entrypoint "tail" amazonlinux:2023 "-f" "/dev/null" 2025-05-07T19:42:40.6796902Z 684dd9e27594bd891e6be6ff1899ad1c0eb766565fb4b8e7ad6703697bf1bd21 2025-05-07T19:42:40.6821020Z ##[command]/usr/bin/docker start 684dd9e27594bd891e6be6ff1899ad1c0eb766565fb4b8e7ad6703697bf1bd21 2025-05-07T19:42:41.1939276Z 684dd9e27594bd891e6be6ff1899ad1c0eb766565fb4b8e7ad6703697bf1bd21 2025-05-07T19:42:41.1960134Z ##[command]/usr/bin/docker ps --all --filter id=684dd9e27594bd891e6be6ff1899ad1c0eb766565fb4b8e7ad6703697bf1bd21 --filter status=running --no-trunc --format "{{.ID}} {{.Status}}" 2025-05-07T19:42:41.2109126Z 684dd9e27594bd891e6be6ff1899ad1c0eb766565fb4b8e7ad6703697bf1bd21 Up Less than a second 2025-05-07T19:42:41.2127031Z ##[command]/usr/bin/docker inspect --format "{{range .Config.Env}}{{println .}}{{end}}" 684dd9e27594bd891e6be6ff1899ad1c0eb766565fb4b8e7ad6703697bf1bd21 2025-05-07T19:42:41.2266206Z GITHUB_ACTIONS=true 2025-05-07T19:42:41.2266577Z CI=true 2025-05-07T19:42:41.2266909Z HOME=/github/home 2025-05-07T19:42:41.2267541Z PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-05-07T19:42:41.2289361Z ##[endgroup] 2025-05-07T19:42:41.2300751Z ##[group]Waiting for all services to be ready 2025-05-07T19:42:41.2302638Z ##[endgroup] 2025-05-07T19:42:41.2387543Z ##[group]Run yum update -y; yum install -y binutils findutils git pciutils sudo tar wget which 2025-05-07T19:42:41.2388410Z yum update -y; yum install -y binutils findutils git pciutils sudo tar wget which 2025-05-07T19:42:41.2389529Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:42:41.2389942Z env: 2025-05-07T19:42:41.2390280Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:42:41.2390616Z BUILD_ENV: build_binary 2025-05-07T19:42:41.2391029Z BUILD_TARGET: genai 2025-05-07T19:42:41.2391338Z BUILD_VARIANT: cuda 2025-05-07T19:42:41.2391596Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:42:41.2391954Z ##[endgroup] 2025-05-07T19:42:42.0522128Z Amazon Linux 2023 repository 68 MB/s | 37 MB 00:00 2025-05-07T19:42:48.6574259Z Last metadata expiration check: 0:00:07 ago on Wed May 7 19:42:41 2025. 2025-05-07T19:42:49.2163363Z Dependencies resolved. 2025-05-07T19:42:49.2339339Z Nothing to do. 2025-05-07T19:42:49.2340212Z Complete! 2025-05-07T19:42:49.4707653Z Last metadata expiration check: 0:00:08 ago on Wed May 7 19:42:41 2025. 2025-05-07T19:42:49.5337652Z Dependencies resolved. 2025-05-07T19:42:49.5561316Z ======================================================================================== 2025-05-07T19:42:49.5562572Z Package Arch Version Repository Size 2025-05-07T19:42:49.5563279Z ======================================================================================== 2025-05-07T19:42:49.5563788Z Installing: 2025-05-07T19:42:49.5564199Z binutils x86_64 2.41-50.amzn2023.0.3 amazonlinux 5.3 M 2025-05-07T19:42:49.5564852Z findutils x86_64 1:4.8.0-2.amzn2023.0.2 amazonlinux 539 k 2025-05-07T19:42:49.5565480Z git x86_64 2.47.1-1.amzn2023.0.2 amazonlinux 54 k 2025-05-07T19:42:49.5566082Z pciutils x86_64 3.7.0-3.amzn2023.0.2 amazonlinux 93 k 2025-05-07T19:42:49.5566703Z sudo x86_64 1.9.15-1.p5.amzn2023.0.1 amazonlinux 1.3 M 2025-05-07T19:42:49.5567259Z tar x86_64 2:1.34-1.amzn2023.0.4 amazonlinux 879 k 2025-05-07T19:42:49.5568233Z wget x86_64 1.21.3-1.amzn2023.0.4 amazonlinux 779 k 2025-05-07T19:42:49.5568852Z which x86_64 2.21-26.amzn2023.0.2 amazonlinux 42 k 2025-05-07T19:42:49.5569350Z Installing dependencies: 2025-05-07T19:42:49.5569860Z cracklib x86_64 2.9.6-27.amzn2023.0.2 amazonlinux 82 k 2025-05-07T19:42:49.5570564Z cyrus-sasl-lib x86_64 2.1.27-18.amzn2023.0.3 amazonlinux 786 k 2025-05-07T19:42:49.5571391Z elfutils-debuginfod-client x86_64 0.188-3.amzn2023.0.2 amazonlinux 41 k 2025-05-07T19:42:49.5572103Z git-core x86_64 2.47.1-1.amzn2023.0.2 amazonlinux 4.7 M 2025-05-07T19:42:49.5572959Z git-core-doc noarch 2.47.1-1.amzn2023.0.2 amazonlinux 2.8 M 2025-05-07T19:42:49.5573887Z gnutls x86_64 3.8.3-6.amzn2023.0.1 amazonlinux 1.1 M 2025-05-07T19:42:49.5574426Z groff-base x86_64 1.22.4-7.amzn2023.0.2 amazonlinux 1.0 M 2025-05-07T19:42:49.5574985Z gzip x86_64 1.12-1.amzn2023.0.1 amazonlinux 160 k 2025-05-07T19:42:49.5575614Z hwdata noarch 0.384-1.amzn2023.0.3 amazonlinux 1.6 M 2025-05-07T19:42:49.5576183Z jansson x86_64 2.14-0.amzn2023 amazonlinux 46 k 2025-05-07T19:42:49.5576754Z kmod-libs x86_64 29-2.amzn2023.0.5 amazonlinux 62 k 2025-05-07T19:42:49.5702606Z less x86_64 608-2.amzn2023.0.2 amazonlinux 168 k 2025-05-07T19:42:49.5703421Z libcbor x86_64 0.7.0-3.amzn2023.0.2 amazonlinux 57 k 2025-05-07T19:42:49.5703947Z libdb x86_64 5.3.28-49.amzn2023.0.2 amazonlinux 756 k 2025-05-07T19:42:49.5704442Z libeconf x86_64 0.4.0-1.amzn2023.0.3 amazonlinux 28 k 2025-05-07T19:42:49.5705002Z libedit x86_64 3.1-38.20210714cvs.amzn2023.0.2 amazonlinux 108 k 2025-05-07T19:42:49.5705631Z libfdisk x86_64 2.37.4-1.amzn2023.0.4 amazonlinux 153 k 2025-05-07T19:42:49.5706259Z libfido2 x86_64 1.10.0-2.amzn2023.0.2 amazonlinux 95 k 2025-05-07T19:42:49.5706809Z libmetalink x86_64 0.1.3-14.amzn2023.0.2 amazonlinux 31 k 2025-05-07T19:42:49.5707466Z libpwquality x86_64 1.4.4-6.amzn2023.0.2 amazonlinux 106 k 2025-05-07T19:42:49.5708123Z libsemanage x86_64 3.4-5.amzn2023.0.2 amazonlinux 121 k 2025-05-07T19:42:49.5708673Z libutempter x86_64 1.2.1-4.amzn2023.0.2 amazonlinux 26 k 2025-05-07T19:42:49.5709195Z nano x86_64 8.3-1.amzn2023 amazonlinux 706 k 2025-05-07T19:42:49.5709694Z ncurses x86_64 6.2-4.20200222.amzn2023.0.6 amazonlinux 394 k 2025-05-07T19:42:49.5710317Z nettle x86_64 3.10.1-1.amzn2023.0.1 amazonlinux 573 k 2025-05-07T19:42:49.5710845Z openldap x86_64 2.4.57-6.amzn2023.0.7 amazonlinux 256 k 2025-05-07T19:42:49.5711343Z openssh x86_64 8.7p1-8.amzn2023.0.14 amazonlinux 454 k 2025-05-07T19:42:49.5711883Z openssh-clients x86_64 8.7p1-8.amzn2023.0.14 amazonlinux 708 k 2025-05-07T19:42:49.5712392Z pam x86_64 1.5.1-8.amzn2023.0.4 amazonlinux 542 k 2025-05-07T19:42:49.5712883Z pciutils-libs x86_64 3.7.0-3.amzn2023.0.2 amazonlinux 41 k 2025-05-07T19:42:49.5713454Z perl-AutoLoader noarch 5.74-477.amzn2023.0.6 amazonlinux 22 k 2025-05-07T19:42:49.5713988Z perl-B x86_64 1.80-477.amzn2023.0.6 amazonlinux 179 k 2025-05-07T19:42:49.5714614Z perl-Carp noarch 1.50-458.amzn2023.0.2 amazonlinux 29 k 2025-05-07T19:42:49.5715161Z perl-Class-Struct noarch 0.66-477.amzn2023.0.6 amazonlinux 22 k 2025-05-07T19:42:49.5715815Z perl-Data-Dumper x86_64 2.174-460.amzn2023.0.2 amazonlinux 55 k 2025-05-07T19:42:49.5716362Z perl-Digest noarch 1.20-1.amzn2023.0.2 amazonlinux 26 k 2025-05-07T19:42:49.5716912Z perl-Digest-MD5 x86_64 2.58-2.amzn2023.0.2 amazonlinux 36 k 2025-05-07T19:42:49.5717447Z perl-DynaLoader x86_64 1.47-477.amzn2023.0.6 amazonlinux 26 k 2025-05-07T19:42:49.5718114Z perl-Encode x86_64 4:3.15-462.amzn2023.0.2 amazonlinux 1.7 M 2025-05-07T19:42:49.5718640Z perl-Errno x86_64 1.30-477.amzn2023.0.6 amazonlinux 15 k 2025-05-07T19:42:49.5719149Z perl-Error noarch 1:0.17029-5.amzn2023.0.2 amazonlinux 41 k 2025-05-07T19:42:49.5719675Z perl-Exporter noarch 5.74-459.amzn2023.0.2 amazonlinux 31 k 2025-05-07T19:42:49.5720226Z perl-Fcntl x86_64 1.13-477.amzn2023.0.6 amazonlinux 21 k 2025-05-07T19:42:49.5720805Z perl-File-Basename noarch 2.85-477.amzn2023.0.6 amazonlinux 18 k 2025-05-07T19:42:49.5721380Z perl-File-Find noarch 1.37-477.amzn2023.0.6 amazonlinux 26 k 2025-05-07T19:42:49.5721955Z perl-File-Path noarch 2.18-2.amzn2023.0.2 amazonlinux 36 k 2025-05-07T19:42:49.5722534Z perl-File-Temp noarch 1:0.231.100-2.amzn2023.0.2 amazonlinux 60 k 2025-05-07T19:42:49.5723178Z perl-File-stat noarch 1.09-477.amzn2023.0.6 amazonlinux 17 k 2025-05-07T19:42:49.5723853Z perl-FileHandle noarch 2.03-477.amzn2023.0.6 amazonlinux 16 k 2025-05-07T19:42:49.5724461Z perl-Getopt-Long noarch 1:2.52-2.amzn2023.0.2 amazonlinux 60 k 2025-05-07T19:42:49.5725070Z perl-Getopt-Std noarch 1.12-477.amzn2023.0.6 amazonlinux 16 k 2025-05-07T19:42:49.5725643Z perl-Git noarch 2.47.1-1.amzn2023.0.2 amazonlinux 42 k 2025-05-07T19:42:49.5726192Z perl-HTTP-Tiny noarch 0.078-1.amzn2023.0.3 amazonlinux 56 k 2025-05-07T19:42:49.5726744Z perl-IO x86_64 1.43-477.amzn2023.0.6 amazonlinux 87 k 2025-05-07T19:42:49.5727271Z perl-IPC-Open3 noarch 1.21-477.amzn2023.0.6 amazonlinux 23 k 2025-05-07T19:42:49.5727861Z perl-MIME-Base64 x86_64 3.16-2.amzn2023.0.2 amazonlinux 31 k 2025-05-07T19:42:49.5728445Z perl-Net-SSLeay x86_64 1.94-1.amzn2023.0.1 amazonlinux 392 k 2025-05-07T19:42:49.5728979Z perl-POSIX x86_64 1.94-477.amzn2023.0.6 amazonlinux 97 k 2025-05-07T19:42:49.5729543Z perl-PathTools x86_64 3.78-459.amzn2023.0.2 amazonlinux 85 k 2025-05-07T19:42:49.5730112Z perl-Pod-Escapes noarch 1:1.07-458.amzn2023.0.2 amazonlinux 20 k 2025-05-07T19:42:49.5730725Z perl-Pod-Perldoc noarch 3.28.01-459.amzn2023.0.3 amazonlinux 84 k 2025-05-07T19:42:49.5731294Z perl-Pod-Simple noarch 1:3.42-2.amzn2023.0.2 amazonlinux 215 k 2025-05-07T19:42:49.5731885Z perl-Pod-Usage noarch 4:2.01-2.amzn2023.0.2 amazonlinux 41 k 2025-05-07T19:42:49.5732490Z perl-Scalar-List-Utils x86_64 4:1.56-459.amzn2023.0.2 amazonlinux 71 k 2025-05-07T19:42:49.5733088Z perl-SelectSaver noarch 1.02-477.amzn2023.0.6 amazonlinux 12 k 2025-05-07T19:42:49.5733661Z perl-Socket x86_64 4:2.032-1.amzn2023.0.2 amazonlinux 55 k 2025-05-07T19:42:49.5734191Z perl-Storable x86_64 1:3.21-458.amzn2023.0.2 amazonlinux 96 k 2025-05-07T19:42:49.5734748Z perl-Symbol noarch 1.08-477.amzn2023.0.6 amazonlinux 15 k 2025-05-07T19:42:49.5735342Z perl-Term-ANSIColor noarch 5.01-459.amzn2023.0.2 amazonlinux 48 k 2025-05-07T19:42:49.5735918Z perl-Term-Cap noarch 1.17-458.amzn2023.0.2 amazonlinux 22 k 2025-05-07T19:42:49.5736498Z perl-TermReadKey x86_64 2.38-9.amzn2023.0.2 amazonlinux 36 k 2025-05-07T19:42:49.5737204Z perl-Text-ParseWords noarch 3.30-458.amzn2023.0.2 amazonlinux 17 k 2025-05-07T19:42:49.5738095Z perl-Text-Tabs+Wrap noarch 2021.0726-1.amzn2023.0.1 amazonlinux 22 k 2025-05-07T19:42:49.5738817Z perl-Time-Local noarch 2:1.300-5.amzn2023.0.2 amazonlinux 34 k 2025-05-07T19:42:49.5739421Z perl-URI noarch 5.09-1.amzn2023.0.2 amazonlinux 108 k 2025-05-07T19:42:49.5739989Z perl-base noarch 2.27-477.amzn2023.0.6 amazonlinux 17 k 2025-05-07T19:42:49.5740571Z perl-constant noarch 1.33-459.amzn2023.0.2 amazonlinux 23 k 2025-05-07T19:42:49.5741168Z perl-if noarch 0.60.800-477.amzn2023.0.6 amazonlinux 14 k 2025-05-07T19:42:49.5741749Z perl-interpreter x86_64 4:5.32.1-477.amzn2023.0.6 amazonlinux 71 k 2025-05-07T19:42:49.5742338Z perl-lib x86_64 0.65-477.amzn2023.0.6 amazonlinux 15 k 2025-05-07T19:42:49.5742905Z perl-libnet noarch 3.13-2.amzn2023.0.2 amazonlinux 126 k 2025-05-07T19:42:49.5743459Z perl-libs x86_64 4:5.32.1-477.amzn2023.0.6 amazonlinux 2.0 M 2025-05-07T19:42:49.5744080Z perl-mro x86_64 1.23-477.amzn2023.0.6 amazonlinux 29 k 2025-05-07T19:42:49.5744692Z perl-overload noarch 1.31-477.amzn2023.0.6 amazonlinux 46 k 2025-05-07T19:42:49.5745312Z perl-overloading noarch 0.02-477.amzn2023.0.6 amazonlinux 13 k 2025-05-07T19:42:49.5745947Z perl-parent noarch 1:0.238-458.amzn2023.0.2 amazonlinux 14 k 2025-05-07T19:42:49.5746540Z perl-podlators noarch 1:4.14-458.amzn2023.0.2 amazonlinux 112 k 2025-05-07T19:42:49.5747139Z perl-subs noarch 1.03-477.amzn2023.0.6 amazonlinux 12 k 2025-05-07T19:42:49.5747717Z perl-vars noarch 1.05-477.amzn2023.0.6 amazonlinux 13 k 2025-05-07T19:42:49.5748279Z shadow-utils x86_64 2:4.9-12.amzn2023.0.4 amazonlinux 1.1 M 2025-05-07T19:42:49.5748881Z systemd-libs x86_64 252.23-3.amzn2023 amazonlinux 613 k 2025-05-07T19:42:49.5749428Z util-linux x86_64 2.37.4-1.amzn2023.0.4 amazonlinux 2.2 M 2025-05-07T19:42:49.5750113Z util-linux-core x86_64 2.37.4-1.amzn2023.0.4 amazonlinux 432 k 2025-05-07T19:42:49.5750539Z Installing weak dependencies: 2025-05-07T19:42:49.5750993Z nano-default-editor noarch 8.3-1.amzn2023 amazonlinux 10 k 2025-05-07T19:42:49.5751599Z perl-IO-Socket-IP noarch 0.41-3.amzn2023.0.2 amazonlinux 42 k 2025-05-07T19:42:49.5752175Z perl-IO-Socket-SSL noarch 2.075-1.amzn2023.0.2 amazonlinux 218 k 2025-05-07T19:42:49.5752763Z perl-Mozilla-CA noarch 20200520-4.amzn2023.0.2 amazonlinux 13 k 2025-05-07T19:42:49.5753306Z perl-NDBM_File x86_64 1.15-477.amzn2023.0.6 amazonlinux 23 k 2025-05-07T19:42:49.5753877Z sudo-python-plugin x86_64 1.9.15-1.p5.amzn2023.0.1 amazonlinux 56 k 2025-05-07T19:42:49.5754216Z 2025-05-07T19:42:49.5754342Z Transaction Summary 2025-05-07T19:42:49.5754622Z ======================================================================================== 2025-05-07T19:42:49.5754966Z Install 107 Packages 2025-05-07T19:42:49.5755116Z 2025-05-07T19:42:49.5755223Z Total download size: 38 M 2025-05-07T19:42:49.5755502Z Installed size: 151 M 2025-05-07T19:42:49.5755747Z Downloading Packages: 2025-05-07T19:42:49.8689396Z (1/107): cyrus-sasl-lib-2.1.27-18.amzn2023.0.3. 31 MB/s | 786 kB 00:00 2025-05-07T19:42:49.8800603Z (2/107): cracklib-2.9.6-27.amzn2023.0.2.x86_64. 2.2 MB/s | 82 kB 00:00 2025-05-07T19:42:49.8829295Z (3/107): elfutils-debuginfod-client-0.188-3.amz 5.5 MB/s | 41 kB 00:00 2025-05-07T19:42:49.9044233Z (4/107): binutils-2.41-50.amzn2023.0.3.x86_64.r 87 MB/s | 5.3 MB 00:00 2025-05-07T19:42:49.9093714Z (5/107): findutils-4.8.0-2.amzn2023.0.2.x86_64. 19 MB/s | 539 kB 00:00 2025-05-07T19:42:49.9107901Z (6/107): git-2.47.1-1.amzn2023.0.2.x86_64.rpm 2.0 MB/s | 54 kB 00:00 2025-05-07T19:42:49.9330306Z (7/107): gnutls-3.8.3-6.amzn2023.0.1.x86_64.rpm 56 MB/s | 1.1 MB 00:00 2025-05-07T19:42:49.9546518Z (8/107): git-core-2.47.1-1.amzn2023.0.2.x86_64. 96 MB/s | 4.7 MB 00:00 2025-05-07T19:42:49.9679227Z (9/107): git-core-doc-2.47.1-1.amzn2023.0.2.noa 51 MB/s | 2.8 MB 00:00 2025-05-07T19:42:49.9751145Z (10/107): groff-base-1.22.4-7.amzn2023.0.2.x86_ 26 MB/s | 1.0 MB 00:00 2025-05-07T19:42:49.9803618Z (11/107): gzip-1.12-1.amzn2023.0.1.x86_64.rpm 15 MB/s | 160 kB 00:00 2025-05-07T19:42:49.9890364Z (12/107): hwdata-0.384-1.amzn2023.0.3.noarch.rp 85 MB/s | 1.6 MB 00:00 2025-05-07T19:42:49.9898535Z (13/107): jansson-2.14-0.amzn2023.x86_64.rpm 3.3 MB/s | 46 kB 00:00 2025-05-07T19:42:49.9920925Z (14/107): kmod-libs-29-2.amzn2023.0.5.x86_64.rp 5.7 MB/s | 62 kB 00:00 2025-05-07T19:42:49.9969904Z (15/107): libcbor-0.7.0-3.amzn2023.0.2.x86_64.r 9.2 MB/s | 57 kB 00:00 2025-05-07T19:42:49.9998095Z (16/107): less-608-2.amzn2023.0.2.x86_64.rpm 18 MB/s | 168 kB 00:00 2025-05-07T19:42:50.0037122Z (17/107): libeconf-0.4.0-1.amzn2023.0.3.x86_64. 4.7 MB/s | 28 kB 00:00 2025-05-07T19:42:50.0095736Z (18/107): libdb-5.3.28-49.amzn2023.0.2.x86_64.r 43 MB/s | 756 kB 00:00 2025-05-07T19:42:50.0118806Z (19/107): libedit-3.1-38.20210714cvs.amzn2023.0 8.9 MB/s | 108 kB 00:00 2025-05-07T19:42:50.0142956Z (20/107): libfdisk-2.37.4-1.amzn2023.0.4.x86_64 16 MB/s | 153 kB 00:00 2025-05-07T19:42:50.0169831Z (21/107): libfido2-1.10.0-2.amzn2023.0.2.x86_64 14 MB/s | 95 kB 00:00 2025-05-07T19:42:50.0219844Z (22/107): libpwquality-1.4.4-6.amzn2023.0.2.x86 15 MB/s | 106 kB 00:00 2025-05-07T19:42:50.0247787Z (23/107): libsemanage-3.4-5.amzn2023.0.2.x86_64 16 MB/s | 121 kB 00:00 2025-05-07T19:42:50.0264684Z (24/107): libmetalink-0.1.3-14.amzn2023.0.2.x86 2.6 MB/s | 31 kB 00:00 2025-05-07T19:42:50.0279996Z (25/107): libutempter-1.2.1-4.amzn2023.0.2.x86_ 4.4 MB/s | 26 kB 00:00 2025-05-07T19:42:50.0328121Z (26/107): nano-default-editor-8.3-1.amzn2023.no 1.7 MB/s | 10 kB 00:00 2025-05-07T19:42:50.0382427Z (27/107): nano-8.3-1.amzn2023.x86_64.rpm 53 MB/s | 706 kB 00:00 2025-05-07T19:42:50.0416067Z (28/107): ncurses-6.2-4.20200222.amzn2023.0.6.x 29 MB/s | 394 kB 00:00 2025-05-07T19:42:50.0469935Z (29/107): nettle-3.10.1-1.amzn2023.0.1.x86_64.r 44 MB/s | 573 kB 00:00 2025-05-07T19:42:50.0515795Z (30/107): openldap-2.4.57-6.amzn2023.0.7.x86_64 31 MB/s | 256 kB 00:00 2025-05-07T19:42:50.0555109Z (31/107): openssh-8.7p1-8.amzn2023.0.14.x86_64. 37 MB/s | 454 kB 00:00 2025-05-07T19:42:50.0608196Z (32/107): openssh-clients-8.7p1-8.amzn2023.0.14 52 MB/s | 708 kB 00:00 2025-05-07T19:42:50.0655256Z (33/107): pam-1.5.1-8.amzn2023.0.4.x86_64.rpm 41 MB/s | 542 kB 00:00 2025-05-07T19:42:50.0673865Z (34/107): pciutils-3.7.0-3.amzn2023.0.2.x86_64. 8.5 MB/s | 93 kB 00:00 2025-05-07T19:42:50.0692258Z (35/107): pciutils-libs-3.7.0-3.amzn2023.0.2.x8 5.5 MB/s | 41 kB 00:00 2025-05-07T19:42:50.0714322Z (36/107): perl-AutoLoader-5.74-477.amzn2023.0.6 4.0 MB/s | 22 kB 00:00 2025-05-07T19:42:50.0751473Z (37/107): perl-Carp-1.50-458.amzn2023.0.2.noarc 5.9 MB/s | 29 kB 00:00 2025-05-07T19:42:50.0779172Z (38/107): perl-B-1.80-477.amzn2023.0.6.x86_64.r 23 MB/s | 179 kB 00:00 2025-05-07T19:42:50.0791481Z (39/107): perl-Class-Struct-0.66-477.amzn2023.0 2.9 MB/s | 22 kB 00:00 2025-05-07T19:42:50.0815305Z (40/107): perl-Data-Dumper-2.174-460.amzn2023.0 9.3 MB/s | 55 kB 00:00 2025-05-07T19:42:50.0848071Z (41/107): perl-Digest-1.20-1.amzn2023.0.2.noarc 5.5 MB/s | 26 kB 00:00 2025-05-07T19:42:50.0866728Z (42/107): perl-Digest-MD5-2.58-2.amzn2023.0.2.x 5.3 MB/s | 36 kB 00:00 2025-05-07T19:42:50.0878640Z (43/107): perl-DynaLoader-1.47-477.amzn2023.0.6 4.0 MB/s | 26 kB 00:00 2025-05-07T19:42:50.0996199Z (44/107): perl-Encode-3.15-462.amzn2023.0.2.x86 119 MB/s | 1.7 MB 00:00 2025-05-07T19:42:50.1013170Z (45/107): perl-Errno-1.30-477.amzn2023.0.6.x86_ 1.2 MB/s | 15 kB 00:00 2025-05-07T19:42:50.1025735Z (46/107): perl-Error-0.17029-5.amzn2023.0.2.noa 2.9 MB/s | 41 kB 00:00 2025-05-07T19:42:50.1052546Z (47/107): perl-Exporter-5.74-459.amzn2023.0.2.n 6.1 MB/s | 31 kB 00:00 2025-05-07T19:42:50.1075429Z (48/107): perl-Fcntl-1.13-477.amzn2023.0.6.x86_ 4.6 MB/s | 21 kB 00:00 2025-05-07T19:42:50.1102619Z (49/107): perl-File-Basename-2.85-477.amzn2023. 2.6 MB/s | 18 kB 00:00 2025-05-07T19:42:50.1123067Z (50/107): perl-File-Find-1.37-477.amzn2023.0.6. 3.7 MB/s | 26 kB 00:00 2025-05-07T19:42:50.1141296Z (51/107): perl-File-Path-2.18-2.amzn2023.0.2.no 5.7 MB/s | 36 kB 00:00 2025-05-07T19:42:50.1163791Z (52/107): perl-File-Temp-0.231.100-2.amzn2023.0 11 MB/s | 60 kB 00:00 2025-05-07T19:42:50.1178627Z (53/107): perl-File-stat-1.09-477.amzn2023.0.6. 3.3 MB/s | 17 kB 00:00 2025-05-07T19:42:50.1217586Z (54/107): perl-FileHandle-2.03-477.amzn2023.0.6 2.3 MB/s | 16 kB 00:00 2025-05-07T19:42:50.1244610Z (55/107): perl-Getopt-Long-2.52-2.amzn2023.0.2. 7.8 MB/s | 60 kB 00:00 2025-05-07T19:42:50.1254752Z (56/107): perl-Getopt-Std-1.12-477.amzn2023.0.6 2.1 MB/s | 16 kB 00:00 2025-05-07T19:42:50.1276883Z (57/107): perl-Git-2.47.1-1.amzn2023.0.2.noarch 8.1 MB/s | 42 kB 00:00 2025-05-07T19:42:50.1323289Z (58/107): perl-HTTP-Tiny-0.078-1.amzn2023.0.3.n 9.6 MB/s | 56 kB 00:00 2025-05-07T19:42:50.1340150Z (59/107): perl-IO-1.43-477.amzn2023.0.6.x86_64. 12 MB/s | 87 kB 00:00 2025-05-07T19:42:50.1364806Z (60/107): perl-IO-Socket-IP-0.41-3.amzn2023.0.2 5.2 MB/s | 42 kB 00:00 2025-05-07T19:42:50.1395946Z (61/107): perl-IPC-Open3-1.21-477.amzn2023.0.6. 4.9 MB/s | 23 kB 00:00 2025-05-07T19:42:50.1430146Z (62/107): perl-IO-Socket-SSL-2.075-1.amzn2023.0 26 MB/s | 218 kB 00:00 2025-05-07T19:42:50.1448659Z (63/107): perl-MIME-Base64-3.16-2.amzn2023.0.2. 3.8 MB/s | 31 kB 00:00 2025-05-07T19:42:50.1467310Z (64/107): perl-Mozilla-CA-20200520-4.amzn2023.0 2.0 MB/s | 13 kB 00:00 2025-05-07T19:42:50.1486323Z (65/107): perl-NDBM_File-1.15-477.amzn2023.0.6. 4.3 MB/s | 23 kB 00:00 2025-05-07T19:42:50.1543313Z (66/107): perl-Net-SSLeay-1.94-1.amzn2023.0.1.x 43 MB/s | 392 kB 00:00 2025-05-07T19:42:50.1569048Z (67/107): perl-POSIX-1.94-477.amzn2023.0.6.x86_ 10 MB/s | 97 kB 00:00 2025-05-07T19:42:50.1588979Z (68/107): perl-PathTools-3.78-459.amzn2023.0.2. 8.2 MB/s | 85 kB 00:00 2025-05-07T19:42:50.1609831Z (69/107): perl-Pod-Escapes-1.07-458.amzn2023.0. 3.4 MB/s | 20 kB 00:00 2025-05-07T19:42:50.1632226Z (70/107): perl-Pod-Perldoc-3.28.01-459.amzn2023 14 MB/s | 84 kB 00:00 2025-05-07T19:42:50.1663055Z (71/107): perl-Pod-Simple-3.42-2.amzn2023.0.2.n 31 MB/s | 215 kB 00:00 2025-05-07T19:42:50.1680588Z (72/107): perl-Pod-Usage-2.01-2.amzn2023.0.2.no 6.0 MB/s | 41 kB 00:00 2025-05-07T19:42:50.1703264Z (73/107): perl-Scalar-List-Utils-1.56-459.amzn2 11 MB/s | 71 kB 00:00 2025-05-07T19:42:50.1719469Z (74/107): perl-SelectSaver-1.02-477.amzn2023.0. 2.4 MB/s | 12 kB 00:00 2025-05-07T19:42:50.1740226Z (75/107): perl-Socket-2.032-1.amzn2023.0.2.x86_ 10 MB/s | 55 kB 00:00 2025-05-07T19:42:50.1771183Z (76/107): perl-Storable-3.21-458.amzn2023.0.2.x 15 MB/s | 96 kB 00:00 2025-05-07T19:42:50.1785212Z (77/107): perl-Symbol-1.08-477.amzn2023.0.6.noa 2.4 MB/s | 15 kB 00:00 2025-05-07T19:42:50.1811929Z (78/107): perl-Term-ANSIColor-5.01-459.amzn2023 7.1 MB/s | 48 kB 00:00 2025-05-07T19:42:50.1826407Z (79/107): perl-Term-Cap-1.17-458.amzn2023.0.2.n 4.3 MB/s | 22 kB 00:00 2025-05-07T19:42:50.1850505Z (80/107): perl-TermReadKey-2.38-9.amzn2023.0.2. 6.6 MB/s | 36 kB 00:00 2025-05-07T19:42:50.1871727Z (81/107): perl-Text-ParseWords-3.30-458.amzn202 2.9 MB/s | 17 kB 00:00 2025-05-07T19:42:50.1900184Z (82/107): perl-Text-Tabs+Wrap-2021.0726-1.amzn2 3.2 MB/s | 22 kB 00:00 2025-05-07T19:42:50.1918487Z (83/107): perl-Time-Local-1.300-5.amzn2023.0.2. 5.2 MB/s | 34 kB 00:00 2025-05-07T19:42:50.1944648Z (84/107): perl-URI-5.09-1.amzn2023.0.2.noarch.r 16 MB/s | 108 kB 00:00 2025-05-07T19:42:50.1963662Z (85/107): perl-base-2.27-477.amzn2023.0.6.noarc 2.8 MB/s | 17 kB 00:00 2025-05-07T19:42:50.1994297Z (86/107): perl-if-0.60.800-477.amzn2023.0.6.noa 3.2 MB/s | 14 kB 00:00 2025-05-07T19:42:50.2012221Z (87/107): perl-constant-1.33-459.amzn2023.0.2.n 2.7 MB/s | 23 kB 00:00 2025-05-07T19:42:50.2034358Z (88/107): perl-interpreter-5.32.1-477.amzn2023. 11 MB/s | 71 kB 00:00 2025-05-07T19:42:50.2051317Z (89/107): perl-lib-0.65-477.amzn2023.0.6.x86_64 2.8 MB/s | 15 kB 00:00 2025-05-07T19:42:50.2077037Z (90/107): perl-libnet-3.13-2.amzn2023.0.2.noarc 20 MB/s | 126 kB 00:00 2025-05-07T19:42:50.2100359Z (91/107): perl-mro-1.23-477.amzn2023.0.6.x86_64 6.4 MB/s | 29 kB 00:00 2025-05-07T19:42:50.2241480Z (92/107): perl-libs-5.32.1-477.amzn2023.0.6.x86 101 MB/s | 2.0 MB 00:00 2025-05-07T19:42:50.2260188Z (93/107): perl-overload-1.31-477.amzn2023.0.6.n 2.6 MB/s | 46 kB 00:00 2025-05-07T19:42:50.2271232Z (94/107): perl-overloading-0.02-477.amzn2023.0. 936 kB/s | 13 kB 00:00 2025-05-07T19:42:50.2289455Z (95/107): perl-parent-0.238-458.amzn2023.0.2.no 3.3 MB/s | 14 kB 00:00 2025-05-07T19:42:50.2333113Z (96/107): perl-podlators-4.14-458.amzn2023.0.2. 20 MB/s | 112 kB 00:00 2025-05-07T19:42:50.2340099Z (97/107): perl-subs-1.03-477.amzn2023.0.6.noarc 1.8 MB/s | 12 kB 00:00 2025-05-07T19:42:50.2362494Z (98/107): perl-vars-1.05-477.amzn2023.0.6.noarc 1.9 MB/s | 13 kB 00:00 2025-05-07T19:42:50.2468210Z (99/107): sudo-1.9.15-1.p5.amzn2023.0.1.x86_64. 105 MB/s | 1.3 MB 00:00 2025-05-07T19:42:50.2546815Z (100/107): shadow-utils-4.9-12.amzn2023.0.4.x86 56 MB/s | 1.1 MB 00:00 2025-05-07T19:42:50.2556397Z (101/107): sudo-python-plugin-1.9.15-1.p5.amzn2 2.8 MB/s | 56 kB 00:00 2025-05-07T19:42:50.2614099Z (102/107): systemd-libs-252.23-3.amzn2023.x86_6 49 MB/s | 613 kB 00:00 2025-05-07T19:42:50.2679669Z (103/107): tar-1.34-1.amzn2023.0.4.x86_64.rpm 79 MB/s | 879 kB 00:00 2025-05-07T19:42:50.2829719Z (104/107): util-linux-2.37.4-1.amzn2023.0.4.x86 85 MB/s | 2.2 MB 00:00 2025-05-07T19:42:50.2878370Z (105/107): util-linux-core-2.37.4-1.amzn2023.0. 17 MB/s | 432 kB 00:00 2025-05-07T19:42:50.2927394Z (106/107): wget-1.21.3-1.amzn2023.0.4.x86_64.rp 36 MB/s | 779 kB 00:00 2025-05-07T19:42:50.2947779Z (107/107): which-2.21-26.amzn2023.0.2.x86_64.rp 4.1 MB/s | 42 kB 00:00 2025-05-07T19:42:50.2968640Z -------------------------------------------------------------------------------- 2025-05-07T19:42:50.2969199Z Total 51 MB/s | 38 MB 00:00 2025-05-07T19:42:51.3525039Z Running transaction check 2025-05-07T19:42:51.3979807Z Transaction check succeeded. 2025-05-07T19:42:51.3980697Z Running transaction test 2025-05-07T19:42:51.7648499Z Transaction test succeeded. 2025-05-07T19:42:51.7648956Z Running transaction 2025-05-07T19:42:52.5128300Z Preparing : 1/1 2025-05-07T19:42:52.5299673Z Installing : systemd-libs-252.23-3.amzn2023.x86_64 1/107 2025-05-07T19:42:52.5545944Z Installing : nettle-3.10.1-1.amzn2023.0.1.x86_64 2/107 2025-05-07T19:42:52.5765308Z Installing : gnutls-3.8.3-6.amzn2023.0.1.x86_64 3/107 2025-05-07T19:42:52.5844175Z Installing : util-linux-core-2.37.4-1.amzn2023.0.4.x86_64 4/107 2025-05-07T19:42:52.5909721Z Running scriptlet: util-linux-core-2.37.4-1.amzn2023.0.4.x86_64 4/107 2025-05-07T19:42:52.6017851Z Installing : pciutils-libs-3.7.0-3.amzn2023.0.2.x86_64 5/107 2025-05-07T19:42:52.6310420Z Installing : ncurses-6.2-4.20200222.amzn2023.0.6.x86_64 6/107 2025-05-07T19:42:52.6395648Z Installing : nano-8.3-1.amzn2023.x86_64 7/107 2025-05-07T19:42:52.6460660Z Installing : nano-default-editor-8.3-1.amzn2023.noarch 8/107 2025-05-07T19:42:52.6974901Z Installing : libsemanage-3.4-5.amzn2023.0.2.x86_64 9/107 2025-05-07T19:42:52.7070468Z Installing : shadow-utils-2:4.9-12.amzn2023.0.4.x86_64 10/107 2025-05-07T19:42:52.7514472Z Running scriptlet: libutempter-1.2.1-4.amzn2023.0.2.x86_64 11/107 2025-05-07T19:42:52.7579592Z Installing : libutempter-1.2.1-4.amzn2023.0.2.x86_64 11/107 2025-05-07T19:42:52.7650530Z Installing : libmetalink-0.1.3-14.amzn2023.0.2.x86_64 12/107 2025-05-07T19:42:52.7718417Z Installing : libfdisk-2.37.4-1.amzn2023.0.4.x86_64 13/107 2025-05-07T19:42:52.7781404Z Installing : libedit-3.1-38.20210714cvs.amzn2023.0.2.x86_64 14/107 2025-05-07T19:42:52.7934005Z Installing : libeconf-0.4.0-1.amzn2023.0.3.x86_64 15/107 2025-05-07T19:42:52.7999096Z Installing : libdb-5.3.28-49.amzn2023.0.2.x86_64 16/107 2025-05-07T19:42:52.8062491Z Installing : libcbor-0.7.0-3.amzn2023.0.2.x86_64 17/107 2025-05-07T19:42:52.8148308Z Installing : libfido2-1.10.0-2.amzn2023.0.2.x86_64 18/107 2025-05-07T19:42:52.8220322Z Installing : less-608-2.amzn2023.0.2.x86_64 19/107 2025-05-07T19:42:52.8280052Z Installing : kmod-libs-29-2.amzn2023.0.5.x86_64 20/107 2025-05-07T19:42:52.8712367Z Installing : jansson-2.14-0.amzn2023.x86_64 21/107 2025-05-07T19:42:52.8802055Z Installing : hwdata-0.384-1.amzn2023.0.3.noarch 22/107 2025-05-07T19:42:52.8958589Z Installing : gzip-1.12-1.amzn2023.0.1.x86_64 23/107 2025-05-07T19:42:52.9409238Z Installing : cracklib-2.9.6-27.amzn2023.0.2.x86_64 24/107 2025-05-07T19:42:52.9600386Z Installing : pam-1.5.1-8.amzn2023.0.4.x86_64 25/107 2025-05-07T19:42:53.0412651Z Installing : libpwquality-1.4.4-6.amzn2023.0.2.x86_64 26/107 2025-05-07T19:42:53.0414336Z Installing : util-linux-2.37.4-1.amzn2023.0.4.x86_64 27/107 2025-05-07T19:42:53.0415730Z warning: /etc/adjtime created as /etc/adjtime.rpmnew 2025-05-07T19:42:53.0416488Z 2025-05-07T19:42:53.0611108Z Running scriptlet: util-linux-2.37.4-1.amzn2023.0.4.x86_64 27/107 2025-05-07T19:42:53.0947752Z Running scriptlet: openssh-8.7p1-8.amzn2023.0.14.x86_64 28/107 2025-05-07T19:42:53.1143908Z Installing : openssh-8.7p1-8.amzn2023.0.14.x86_64 28/107 2025-05-07T19:42:53.1216176Z Installing : openssh-clients-8.7p1-8.amzn2023.0.14.x86_64 29/107 2025-05-07T19:42:53.2317795Z Running scriptlet: openssh-clients-8.7p1-8.amzn2023.0.14.x86_64 29/107 2025-05-07T19:42:53.3800857Z Installing : git-core-2.47.1-1.amzn2023.0.2.x86_64 30/107 2025-05-07T19:42:53.3935140Z Installing : git-core-doc-2.47.1-1.amzn2023.0.2.noarch 31/107 2025-05-07T19:42:53.4337279Z Running scriptlet: groff-base-1.22.4-7.amzn2023.0.2.x86_64 32/107 2025-05-07T19:42:53.4424470Z Installing : groff-base-1.22.4-7.amzn2023.0.2.x86_64 32/107 2025-05-07T19:42:53.4500353Z Running scriptlet: groff-base-1.22.4-7.amzn2023.0.2.x86_64 32/107 2025-05-07T19:42:53.4574574Z Installing : perl-Digest-1.20-1.amzn2023.0.2.noarch 33/107 2025-05-07T19:42:53.4664212Z Installing : perl-Digest-MD5-2.58-2.amzn2023.0.2.x86_64 34/107 2025-05-07T19:42:53.4722487Z Installing : perl-B-1.80-477.amzn2023.0.6.x86_64 35/107 2025-05-07T19:42:53.4768896Z Installing : perl-FileHandle-2.03-477.amzn2023.0.6.noarch 36/107 2025-05-07T19:42:53.4820817Z Installing : perl-AutoLoader-5.74-477.amzn2023.0.6.noarch 37/107 2025-05-07T19:42:53.4916470Z Installing : perl-Data-Dumper-2.174-460.amzn2023.0.2.x86_64 38/107 2025-05-07T19:42:53.4980752Z Installing : perl-libnet-3.13-2.amzn2023.0.2.noarch 39/107 2025-05-07T19:42:53.5078819Z Installing : perl-base-2.27-477.amzn2023.0.6.noarch 40/107 2025-05-07T19:42:53.5296335Z Installing : perl-URI-5.09-1.amzn2023.0.2.noarch 41/107 2025-05-07T19:42:53.5380233Z Installing : perl-Net-SSLeay-1.94-1.amzn2023.0.1.x86_64 42/107 2025-05-07T19:42:53.5435727Z Installing : perl-Text-Tabs+Wrap-2021.0726-1.amzn2023.0.1.noa 43/107 2025-05-07T19:42:53.5480498Z Installing : perl-Mozilla-CA-20200520-4.amzn2023.0.2.noarch 44/107 2025-05-07T19:42:53.5541162Z Installing : perl-if-0.60.800-477.amzn2023.0.6.noarch 45/107 2025-05-07T19:42:53.5600655Z Installing : perl-IO-Socket-IP-0.41-3.amzn2023.0.2.noarch 46/107 2025-05-07T19:42:53.5655962Z Installing : perl-Time-Local-2:1.300-5.amzn2023.0.2.noarch 47/107 2025-05-07T19:42:53.5742352Z Installing : perl-File-Path-2.18-2.amzn2023.0.2.noarch 48/107 2025-05-07T19:42:53.5807243Z Installing : perl-IO-Socket-SSL-2.075-1.amzn2023.0.2.noarch 49/107 2025-05-07T19:42:53.5852652Z Installing : perl-Pod-Escapes-1:1.07-458.amzn2023.0.2.noarch 50/107 2025-05-07T19:42:53.5911002Z Installing : perl-Class-Struct-0.66-477.amzn2023.0.6.noarch 51/107 2025-05-07T19:42:53.5970898Z Installing : perl-POSIX-1.94-477.amzn2023.0.6.x86_64 52/107 2025-05-07T19:42:53.6032512Z Installing : perl-Term-ANSIColor-5.01-459.amzn2023.0.2.noarch 53/107 2025-05-07T19:42:53.6080010Z Installing : perl-IPC-Open3-1.21-477.amzn2023.0.6.noarch 54/107 2025-05-07T19:42:53.6141149Z Installing : perl-subs-1.03-477.amzn2023.0.6.noarch 55/107 2025-05-07T19:42:53.6207915Z Installing : perl-File-Temp-1:0.231.100-2.amzn2023.0.2.noarch 56/107 2025-05-07T19:42:53.6263596Z Installing : perl-HTTP-Tiny-0.078-1.amzn2023.0.3.noarch 57/107 2025-05-07T19:42:53.6374900Z Installing : perl-Term-Cap-1.17-458.amzn2023.0.2.noarch 58/107 2025-05-07T19:42:53.6456852Z Installing : perl-Pod-Simple-1:3.42-2.amzn2023.0.2.noarch 59/107 2025-05-07T19:42:53.6512895Z Installing : perl-Socket-4:2.032-1.amzn2023.0.2.x86_64 60/107 2025-05-07T19:42:53.6560196Z Installing : perl-SelectSaver-1.02-477.amzn2023.0.6.noarch 61/107 2025-05-07T19:42:53.6603808Z Installing : perl-Symbol-1.08-477.amzn2023.0.6.noarch 62/107 2025-05-07T19:42:53.6681073Z Installing : perl-File-stat-1.09-477.amzn2023.0.6.noarch 63/107 2025-05-07T19:42:53.6781187Z Installing : perl-podlators-1:4.14-458.amzn2023.0.2.noarch 64/107 2025-05-07T19:42:53.6855916Z Installing : perl-Pod-Perldoc-3.28.01-459.amzn2023.0.3.noarch 65/107 2025-05-07T19:42:53.6914901Z Installing : perl-Fcntl-1.13-477.amzn2023.0.6.x86_64 66/107 2025-05-07T19:42:53.6973774Z Installing : perl-Text-ParseWords-3.30-458.amzn2023.0.2.noarc 67/107 2025-05-07T19:42:53.7049027Z Installing : perl-mro-1.23-477.amzn2023.0.6.x86_64 68/107 2025-05-07T19:42:53.7110758Z Installing : perl-IO-1.43-477.amzn2023.0.6.x86_64 69/107 2025-05-07T19:42:53.7170319Z Installing : perl-overloading-0.02-477.amzn2023.0.6.noarch 70/107 2025-05-07T19:42:53.7240096Z Installing : perl-Pod-Usage-4:2.01-2.amzn2023.0.2.noarch 71/107 2025-05-07T19:42:53.7289698Z Installing : perl-Errno-1.30-477.amzn2023.0.6.x86_64 72/107 2025-05-07T19:42:53.7340868Z Installing : perl-File-Basename-2.85-477.amzn2023.0.6.noarch 73/107 2025-05-07T19:42:53.7403160Z Installing : perl-Getopt-Std-1.12-477.amzn2023.0.6.noarch 74/107 2025-05-07T19:42:53.7481641Z Installing : perl-MIME-Base64-3.16-2.amzn2023.0.2.x86_64 75/107 2025-05-07T19:42:53.7564188Z Installing : perl-Scalar-List-Utils-4:1.56-459.amzn2023.0.2.x 76/107 2025-05-07T19:42:53.7628906Z Installing : perl-constant-1.33-459.amzn2023.0.2.noarch 77/107 2025-05-07T19:42:53.7694474Z Installing : perl-Storable-1:3.21-458.amzn2023.0.2.x86_64 78/107 2025-05-07T19:42:53.7741574Z Installing : perl-overload-1.31-477.amzn2023.0.6.noarch 79/107 2025-05-07T19:42:53.7789989Z Installing : perl-parent-1:0.238-458.amzn2023.0.2.noarch 80/107 2025-05-07T19:42:53.7855166Z Installing : perl-vars-1.05-477.amzn2023.0.6.noarch 81/107 2025-05-07T19:42:53.7912033Z Installing : perl-Getopt-Long-1:2.52-2.amzn2023.0.2.noarch 82/107 2025-05-07T19:42:53.7968865Z Installing : perl-DynaLoader-1.47-477.amzn2023.0.6.x86_64 83/107 2025-05-07T19:42:53.8024031Z Installing : perl-Carp-1.50-458.amzn2023.0.2.noarch 84/107 2025-05-07T19:42:53.8078734Z Installing : perl-Exporter-5.74-459.amzn2023.0.2.noarch 85/107 2025-05-07T19:42:53.8156969Z Installing : perl-NDBM_File-1.15-477.amzn2023.0.6.x86_64 86/107 2025-05-07T19:42:53.8692524Z Installing : perl-PathTools-3.78-459.amzn2023.0.2.x86_64 87/107 2025-05-07T19:42:53.9653475Z Installing : perl-Encode-4:3.15-462.amzn2023.0.2.x86_64 88/107 2025-05-07T19:42:53.9782058Z Installing : perl-libs-4:5.32.1-477.amzn2023.0.6.x86_64 89/107 2025-05-07T19:42:53.9863524Z Installing : perl-interpreter-4:5.32.1-477.amzn2023.0.6.x86_6 90/107 2025-05-07T19:42:53.9935558Z Installing : perl-Error-1:0.17029-5.amzn2023.0.2.noarch 91/107 2025-05-07T19:42:54.0001987Z Installing : perl-File-Find-1.37-477.amzn2023.0.6.noarch 92/107 2025-05-07T19:42:54.0074607Z Installing : perl-TermReadKey-2.38-9.amzn2023.0.2.x86_64 93/107 2025-05-07T19:42:54.0127583Z Installing : perl-lib-0.65-477.amzn2023.0.6.x86_64 94/107 2025-05-07T19:42:54.0191978Z Installing : perl-Git-2.47.1-1.amzn2023.0.2.noarch 95/107 2025-05-07T19:42:54.0260765Z Installing : git-2.47.1-1.amzn2023.0.2.x86_64 96/107 2025-05-07T19:42:54.0464555Z Installing : elfutils-debuginfod-client-0.188-3.amzn2023.0.2. 97/107 2025-05-07T19:42:54.0589106Z Installing : cyrus-sasl-lib-2.1.27-18.amzn2023.0.3.x86_64 98/107 2025-05-07T19:42:54.0679831Z Installing : openldap-2.4.57-6.amzn2023.0.7.x86_64 99/107 2025-05-07T19:42:54.1080522Z Installing : sudo-python-plugin-1.9.15-1.p5.amzn2023.0.1.x86_ 100/107 2025-05-07T19:42:54.2297651Z Installing : sudo-1.9.15-1.p5.amzn2023.0.1.x86_64 101/107 2025-05-07T19:42:54.2383603Z Installing : binutils-2.41-50.amzn2023.0.3.x86_64 102/107 2025-05-07T19:42:54.2490735Z Running scriptlet: binutils-2.41-50.amzn2023.0.3.x86_64 102/107 2025-05-07T19:42:54.2788539Z Installing : pciutils-3.7.0-3.amzn2023.0.2.x86_64 103/107 2025-05-07T19:42:54.2886341Z Installing : wget-1.21.3-1.amzn2023.0.4.x86_64 104/107 2025-05-07T19:42:54.3132097Z Installing : which-2.21-26.amzn2023.0.2.x86_64 105/107 2025-05-07T19:42:54.3346449Z Installing : tar-2:1.34-1.amzn2023.0.4.x86_64 106/107 2025-05-07T19:42:54.3431977Z Installing : findutils-1:4.8.0-2.amzn2023.0.2.x86_64 107/107 2025-05-07T19:42:54.3543752Z Running scriptlet: pam-1.5.1-8.amzn2023.0.4.x86_64 107/107 2025-05-07T19:42:55.1199569Z Running scriptlet: findutils-1:4.8.0-2.amzn2023.0.2.x86_64 107/107 2025-05-07T19:42:55.1201661Z Verifying : binutils-2.41-50.amzn2023.0.3.x86_64 1/107 2025-05-07T19:42:55.1203379Z Verifying : cracklib-2.9.6-27.amzn2023.0.2.x86_64 2/107 2025-05-07T19:42:55.1203980Z Verifying : cyrus-sasl-lib-2.1.27-18.amzn2023.0.3.x86_64 3/107 2025-05-07T19:42:55.1204643Z Verifying : elfutils-debuginfod-client-0.188-3.amzn2023.0.2. 4/107 2025-05-07T19:42:55.1205472Z Verifying : findutils-1:4.8.0-2.amzn2023.0.2.x86_64 5/107 2025-05-07T19:42:55.1206049Z Verifying : git-2.47.1-1.amzn2023.0.2.x86_64 6/107 2025-05-07T19:42:55.1206658Z Verifying : git-core-2.47.1-1.amzn2023.0.2.x86_64 7/107 2025-05-07T19:42:55.1207316Z Verifying : git-core-doc-2.47.1-1.amzn2023.0.2.noarch 8/107 2025-05-07T19:42:55.1208339Z Verifying : gnutls-3.8.3-6.amzn2023.0.1.x86_64 9/107 2025-05-07T19:42:55.1208937Z Verifying : groff-base-1.22.4-7.amzn2023.0.2.x86_64 10/107 2025-05-07T19:42:55.1209597Z Verifying : gzip-1.12-1.amzn2023.0.1.x86_64 11/107 2025-05-07T19:42:55.1210229Z Verifying : hwdata-0.384-1.amzn2023.0.3.noarch 12/107 2025-05-07T19:42:55.1210798Z Verifying : jansson-2.14-0.amzn2023.x86_64 13/107 2025-05-07T19:42:55.1211451Z Verifying : kmod-libs-29-2.amzn2023.0.5.x86_64 14/107 2025-05-07T19:42:55.1212019Z Verifying : less-608-2.amzn2023.0.2.x86_64 15/107 2025-05-07T19:42:55.1212646Z Verifying : libcbor-0.7.0-3.amzn2023.0.2.x86_64 16/107 2025-05-07T19:42:55.1213293Z Verifying : libdb-5.3.28-49.amzn2023.0.2.x86_64 17/107 2025-05-07T19:42:55.1213875Z Verifying : libeconf-0.4.0-1.amzn2023.0.3.x86_64 18/107 2025-05-07T19:42:55.1214552Z Verifying : libedit-3.1-38.20210714cvs.amzn2023.0.2.x86_64 19/107 2025-05-07T19:42:55.1215222Z Verifying : libfdisk-2.37.4-1.amzn2023.0.4.x86_64 20/107 2025-05-07T19:42:55.1215847Z Verifying : libfido2-1.10.0-2.amzn2023.0.2.x86_64 21/107 2025-05-07T19:42:55.1216550Z Verifying : libmetalink-0.1.3-14.amzn2023.0.2.x86_64 22/107 2025-05-07T19:42:55.1217337Z Verifying : libpwquality-1.4.4-6.amzn2023.0.2.x86_64 23/107 2025-05-07T19:42:55.1218206Z Verifying : libsemanage-3.4-5.amzn2023.0.2.x86_64 24/107 2025-05-07T19:42:55.1218876Z Verifying : libutempter-1.2.1-4.amzn2023.0.2.x86_64 25/107 2025-05-07T19:42:55.1219525Z Verifying : nano-8.3-1.amzn2023.x86_64 26/107 2025-05-07T19:42:55.1220198Z Verifying : nano-default-editor-8.3-1.amzn2023.noarch 27/107 2025-05-07T19:42:55.1220825Z Verifying : ncurses-6.2-4.20200222.amzn2023.0.6.x86_64 28/107 2025-05-07T19:42:55.1221486Z Verifying : nettle-3.10.1-1.amzn2023.0.1.x86_64 29/107 2025-05-07T19:42:55.1222090Z Verifying : openldap-2.4.57-6.amzn2023.0.7.x86_64 30/107 2025-05-07T19:42:55.1222741Z Verifying : openssh-8.7p1-8.amzn2023.0.14.x86_64 31/107 2025-05-07T19:42:55.1223464Z Verifying : openssh-clients-8.7p1-8.amzn2023.0.14.x86_64 32/107 2025-05-07T19:42:55.1224198Z Verifying : pam-1.5.1-8.amzn2023.0.4.x86_64 33/107 2025-05-07T19:42:55.1224905Z Verifying : pciutils-3.7.0-3.amzn2023.0.2.x86_64 34/107 2025-05-07T19:42:55.1225446Z Verifying : pciutils-libs-3.7.0-3.amzn2023.0.2.x86_64 35/107 2025-05-07T19:42:55.1226012Z Verifying : perl-AutoLoader-5.74-477.amzn2023.0.6.noarch 36/107 2025-05-07T19:42:55.1226721Z Verifying : perl-B-1.80-477.amzn2023.0.6.x86_64 37/107 2025-05-07T19:42:55.1227254Z Verifying : perl-Carp-1.50-458.amzn2023.0.2.noarch 38/107 2025-05-07T19:42:55.1227792Z Verifying : perl-Class-Struct-0.66-477.amzn2023.0.6.noarch 39/107 2025-05-07T19:42:55.1228375Z Verifying : perl-Data-Dumper-2.174-460.amzn2023.0.2.x86_64 40/107 2025-05-07T19:42:55.1228916Z Verifying : perl-Digest-1.20-1.amzn2023.0.2.noarch 41/107 2025-05-07T19:42:55.1229473Z Verifying : perl-Digest-MD5-2.58-2.amzn2023.0.2.x86_64 42/107 2025-05-07T19:42:55.1230036Z Verifying : perl-DynaLoader-1.47-477.amzn2023.0.6.x86_64 43/107 2025-05-07T19:42:55.1230562Z Verifying : perl-Encode-4:3.15-462.amzn2023.0.2.x86_64 44/107 2025-05-07T19:42:55.1231099Z Verifying : perl-Errno-1.30-477.amzn2023.0.6.x86_64 45/107 2025-05-07T19:42:55.1231709Z Verifying : perl-Error-1:0.17029-5.amzn2023.0.2.noarch 46/107 2025-05-07T19:42:55.1232260Z Verifying : perl-Exporter-5.74-459.amzn2023.0.2.noarch 47/107 2025-05-07T19:42:55.1232801Z Verifying : perl-Fcntl-1.13-477.amzn2023.0.6.x86_64 48/107 2025-05-07T19:42:55.1233366Z Verifying : perl-File-Basename-2.85-477.amzn2023.0.6.noarch 49/107 2025-05-07T19:42:55.1233937Z Verifying : perl-File-Find-1.37-477.amzn2023.0.6.noarch 50/107 2025-05-07T19:42:55.1234474Z Verifying : perl-File-Path-2.18-2.amzn2023.0.2.noarch 51/107 2025-05-07T19:42:55.1235077Z Verifying : perl-File-Temp-1:0.231.100-2.amzn2023.0.2.noarch 52/107 2025-05-07T19:42:55.1235626Z Verifying : perl-File-stat-1.09-477.amzn2023.0.6.noarch 53/107 2025-05-07T19:42:55.1236164Z Verifying : perl-FileHandle-2.03-477.amzn2023.0.6.noarch 54/107 2025-05-07T19:42:55.1236712Z Verifying : perl-Getopt-Long-1:2.52-2.amzn2023.0.2.noarch 55/107 2025-05-07T19:42:55.1237235Z Verifying : perl-Getopt-Std-1.12-477.amzn2023.0.6.noarch 56/107 2025-05-07T19:42:55.1237784Z Verifying : perl-Git-2.47.1-1.amzn2023.0.2.noarch 57/107 2025-05-07T19:42:55.1238304Z Verifying : perl-HTTP-Tiny-0.078-1.amzn2023.0.3.noarch 58/107 2025-05-07T19:42:55.1238829Z Verifying : perl-IO-1.43-477.amzn2023.0.6.x86_64 59/107 2025-05-07T19:42:55.1239364Z Verifying : perl-IO-Socket-IP-0.41-3.amzn2023.0.2.noarch 60/107 2025-05-07T19:42:55.1239899Z Verifying : perl-IO-Socket-SSL-2.075-1.amzn2023.0.2.noarch 61/107 2025-05-07T19:42:55.1240450Z Verifying : perl-IPC-Open3-1.21-477.amzn2023.0.6.noarch 62/107 2025-05-07T19:42:55.1240978Z Verifying : perl-MIME-Base64-3.16-2.amzn2023.0.2.x86_64 63/107 2025-05-07T19:42:55.1241526Z Verifying : perl-Mozilla-CA-20200520-4.amzn2023.0.2.noarch 64/107 2025-05-07T19:42:55.1242074Z Verifying : perl-NDBM_File-1.15-477.amzn2023.0.6.x86_64 65/107 2025-05-07T19:42:55.1242590Z Verifying : perl-Net-SSLeay-1.94-1.amzn2023.0.1.x86_64 66/107 2025-05-07T19:42:55.1243123Z Verifying : perl-POSIX-1.94-477.amzn2023.0.6.x86_64 67/107 2025-05-07T19:42:55.1243647Z Verifying : perl-PathTools-3.78-459.amzn2023.0.2.x86_64 68/107 2025-05-07T19:42:55.1244192Z Verifying : perl-Pod-Escapes-1:1.07-458.amzn2023.0.2.noarch 69/107 2025-05-07T19:42:55.1244728Z Verifying : perl-Pod-Perldoc-3.28.01-459.amzn2023.0.3.noarch 70/107 2025-05-07T19:42:55.1245279Z Verifying : perl-Pod-Simple-1:3.42-2.amzn2023.0.2.noarch 71/107 2025-05-07T19:42:55.1245813Z Verifying : perl-Pod-Usage-4:2.01-2.amzn2023.0.2.noarch 72/107 2025-05-07T19:42:55.1246338Z Verifying : perl-Scalar-List-Utils-4:1.56-459.amzn2023.0.2.x 73/107 2025-05-07T19:42:55.1246996Z Verifying : perl-SelectSaver-1.02-477.amzn2023.0.6.noarch 74/107 2025-05-07T19:42:55.1247527Z Verifying : perl-Socket-4:2.032-1.amzn2023.0.2.x86_64 75/107 2025-05-07T19:42:55.1248057Z Verifying : perl-Storable-1:3.21-458.amzn2023.0.2.x86_64 76/107 2025-05-07T19:42:55.1248573Z Verifying : perl-Symbol-1.08-477.amzn2023.0.6.noarch 77/107 2025-05-07T19:42:55.1249132Z Verifying : perl-Term-ANSIColor-5.01-459.amzn2023.0.2.noarch 78/107 2025-05-07T19:42:55.1249687Z Verifying : perl-Term-Cap-1.17-458.amzn2023.0.2.noarch 79/107 2025-05-07T19:42:55.1250221Z Verifying : perl-TermReadKey-2.38-9.amzn2023.0.2.x86_64 80/107 2025-05-07T19:42:55.1250784Z Verifying : perl-Text-ParseWords-3.30-458.amzn2023.0.2.noarc 81/107 2025-05-07T19:42:55.1251334Z Verifying : perl-Text-Tabs+Wrap-2021.0726-1.amzn2023.0.1.noa 82/107 2025-05-07T19:42:55.1251884Z Verifying : perl-Time-Local-2:1.300-5.amzn2023.0.2.noarch 83/107 2025-05-07T19:42:55.1252485Z Verifying : perl-URI-5.09-1.amzn2023.0.2.noarch 84/107 2025-05-07T19:42:55.1252998Z Verifying : perl-base-2.27-477.amzn2023.0.6.noarch 85/107 2025-05-07T19:42:55.1253539Z Verifying : perl-constant-1.33-459.amzn2023.0.2.noarch 86/107 2025-05-07T19:42:55.1254061Z Verifying : perl-if-0.60.800-477.amzn2023.0.6.noarch 87/107 2025-05-07T19:42:55.1254593Z Verifying : perl-interpreter-4:5.32.1-477.amzn2023.0.6.x86_6 88/107 2025-05-07T19:42:55.1255105Z Verifying : perl-lib-0.65-477.amzn2023.0.6.x86_64 89/107 2025-05-07T19:42:55.1255640Z Verifying : perl-libnet-3.13-2.amzn2023.0.2.noarch 90/107 2025-05-07T19:42:55.1256167Z Verifying : perl-libs-4:5.32.1-477.amzn2023.0.6.x86_64 91/107 2025-05-07T19:42:55.1256662Z Verifying : perl-mro-1.23-477.amzn2023.0.6.x86_64 92/107 2025-05-07T19:42:55.1257313Z Verifying : perl-overload-1.31-477.amzn2023.0.6.noarch 93/107 2025-05-07T19:42:55.1258053Z Verifying : perl-overloading-0.02-477.amzn2023.0.6.noarch 94/107 2025-05-07T19:42:55.1258644Z Verifying : perl-parent-1:0.238-458.amzn2023.0.2.noarch 95/107 2025-05-07T19:42:55.1259197Z Verifying : perl-podlators-1:4.14-458.amzn2023.0.2.noarch 96/107 2025-05-07T19:42:55.1259729Z Verifying : perl-subs-1.03-477.amzn2023.0.6.noarch 97/107 2025-05-07T19:42:55.1260276Z Verifying : perl-vars-1.05-477.amzn2023.0.6.noarch 98/107 2025-05-07T19:42:55.1260803Z Verifying : shadow-utils-2:4.9-12.amzn2023.0.4.x86_64 99/107 2025-05-07T19:42:55.1261339Z Verifying : sudo-1.9.15-1.p5.amzn2023.0.1.x86_64 100/107 2025-05-07T19:42:55.1261870Z Verifying : sudo-python-plugin-1.9.15-1.p5.amzn2023.0.1.x86_ 101/107 2025-05-07T19:42:55.1262461Z Verifying : systemd-libs-252.23-3.amzn2023.x86_64 102/107 2025-05-07T19:42:55.1262988Z Verifying : tar-2:1.34-1.amzn2023.0.4.x86_64 103/107 2025-05-07T19:42:55.1263495Z Verifying : util-linux-2.37.4-1.amzn2023.0.4.x86_64 104/107 2025-05-07T19:42:55.1264046Z Verifying : util-linux-core-2.37.4-1.amzn2023.0.4.x86_64 105/107 2025-05-07T19:42:55.1264568Z Verifying : wget-1.21.3-1.amzn2023.0.4.x86_64 106/107 2025-05-07T19:42:55.2221280Z Verifying : which-2.21-26.amzn2023.0.2.x86_64 107/107 2025-05-07T19:42:55.2222186Z 2025-05-07T19:42:55.2222353Z Installed: 2025-05-07T19:42:55.2222816Z binutils-2.41-50.amzn2023.0.3.x86_64 2025-05-07T19:42:55.2223366Z cracklib-2.9.6-27.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2224084Z cyrus-sasl-lib-2.1.27-18.amzn2023.0.3.x86_64 2025-05-07T19:42:55.2224995Z elfutils-debuginfod-client-0.188-3.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2225581Z findutils-1:4.8.0-2.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2226083Z git-2.47.1-1.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2226596Z git-core-2.47.1-1.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2227127Z git-core-doc-2.47.1-1.amzn2023.0.2.noarch 2025-05-07T19:42:55.2227671Z gnutls-3.8.3-6.amzn2023.0.1.x86_64 2025-05-07T19:42:55.2228200Z groff-base-1.22.4-7.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2228703Z gzip-1.12-1.amzn2023.0.1.x86_64 2025-05-07T19:42:55.2229239Z hwdata-0.384-1.amzn2023.0.3.noarch 2025-05-07T19:42:55.2229884Z jansson-2.14-0.amzn2023.x86_64 2025-05-07T19:42:55.2230636Z kmod-libs-29-2.amzn2023.0.5.x86_64 2025-05-07T19:42:55.2231114Z less-608-2.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2231599Z libcbor-0.7.0-3.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2232089Z libdb-5.3.28-49.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2232578Z libeconf-0.4.0-1.amzn2023.0.3.x86_64 2025-05-07T19:42:55.2233126Z libedit-3.1-38.20210714cvs.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2233649Z libfdisk-2.37.4-1.amzn2023.0.4.x86_64 2025-05-07T19:42:55.2234224Z libfido2-1.10.0-2.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2234759Z libmetalink-0.1.3-14.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2235308Z libpwquality-1.4.4-6.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2235823Z libsemanage-3.4-5.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2236351Z libutempter-1.2.1-4.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2236833Z nano-8.3-1.amzn2023.x86_64 2025-05-07T19:42:55.2237353Z nano-default-editor-8.3-1.amzn2023.noarch 2025-05-07T19:42:55.2237906Z ncurses-6.2-4.20200222.amzn2023.0.6.x86_64 2025-05-07T19:42:55.2238398Z nettle-3.10.1-1.amzn2023.0.1.x86_64 2025-05-07T19:42:55.2238905Z openldap-2.4.57-6.amzn2023.0.7.x86_64 2025-05-07T19:42:55.2239403Z openssh-8.7p1-8.amzn2023.0.14.x86_64 2025-05-07T19:42:55.2239945Z openssh-clients-8.7p1-8.amzn2023.0.14.x86_64 2025-05-07T19:42:55.2240446Z pam-1.5.1-8.amzn2023.0.4.x86_64 2025-05-07T19:42:55.2240946Z pciutils-3.7.0-3.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2241474Z pciutils-libs-3.7.0-3.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2241996Z perl-AutoLoader-5.74-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.2242504Z perl-B-1.80-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.2243002Z perl-Carp-1.50-458.amzn2023.0.2.noarch 2025-05-07T19:42:55.2243588Z perl-Class-Struct-0.66-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.2244145Z perl-Data-Dumper-2.174-460.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2244867Z perl-Digest-1.20-1.amzn2023.0.2.noarch 2025-05-07T19:42:55.2245469Z perl-Digest-MD5-2.58-2.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2246012Z perl-DynaLoader-1.47-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.2246584Z perl-Encode-4:3.15-462.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2247106Z perl-Errno-1.30-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.2247681Z perl-Error-1:0.17029-5.amzn2023.0.2.noarch 2025-05-07T19:42:55.2248254Z perl-Exporter-5.74-459.amzn2023.0.2.noarch 2025-05-07T19:42:55.2248787Z perl-Fcntl-1.13-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.2249352Z perl-File-Basename-2.85-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.2249901Z perl-File-Find-1.37-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.2251510Z perl-File-Path-2.18-2.amzn2023.0.2.noarch 2025-05-07T19:42:55.2252063Z perl-File-Temp-1:0.231.100-2.amzn2023.0.2.noarch 2025-05-07T19:42:55.2252579Z perl-File-stat-1.09-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.2253133Z perl-FileHandle-2.03-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.2253656Z perl-Getopt-Long-1:2.52-2.amzn2023.0.2.noarch 2025-05-07T19:42:55.2254197Z perl-Getopt-Std-1.12-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.2254728Z perl-Git-2.47.1-1.amzn2023.0.2.noarch 2025-05-07T19:42:55.2255234Z perl-HTTP-Tiny-0.078-1.amzn2023.0.3.noarch 2025-05-07T19:42:55.2255755Z perl-IO-1.43-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.2256265Z perl-IO-Socket-IP-0.41-3.amzn2023.0.2.noarch 2025-05-07T19:42:55.2256820Z perl-IO-Socket-SSL-2.075-1.amzn2023.0.2.noarch 2025-05-07T19:42:55.2257643Z perl-IPC-Open3-1.21-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.2258228Z perl-MIME-Base64-3.16-2.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2258832Z perl-Mozilla-CA-20200520-4.amzn2023.0.2.noarch 2025-05-07T19:42:55.2259414Z perl-NDBM_File-1.15-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.2260003Z perl-Net-SSLeay-1.94-1.amzn2023.0.1.x86_64 2025-05-07T19:42:55.2260574Z perl-POSIX-1.94-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.2261172Z perl-PathTools-3.78-459.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2261763Z perl-Pod-Escapes-1:1.07-458.amzn2023.0.2.noarch 2025-05-07T19:42:55.2262371Z perl-Pod-Perldoc-3.28.01-459.amzn2023.0.3.noarch 2025-05-07T19:42:55.2262974Z perl-Pod-Simple-1:3.42-2.amzn2023.0.2.noarch 2025-05-07T19:42:55.2263533Z perl-Pod-Usage-4:2.01-2.amzn2023.0.2.noarch 2025-05-07T19:42:55.2264136Z perl-Scalar-List-Utils-4:1.56-459.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2264736Z perl-SelectSaver-1.02-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.2265336Z perl-Socket-4:2.032-1.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2265910Z perl-Storable-1:3.21-458.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2266474Z perl-Symbol-1.08-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.2267180Z perl-Term-ANSIColor-5.01-459.amzn2023.0.2.noarch 2025-05-07T19:42:55.2267963Z perl-Term-Cap-1.17-458.amzn2023.0.2.noarch 2025-05-07T19:42:55.2268577Z perl-TermReadKey-2.38-9.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2269167Z perl-Text-ParseWords-3.30-458.amzn2023.0.2.noarch 2025-05-07T19:42:55.2269790Z perl-Text-Tabs+Wrap-2021.0726-1.amzn2023.0.1.noarch 2025-05-07T19:42:55.2270383Z perl-Time-Local-2:1.300-5.amzn2023.0.2.noarch 2025-05-07T19:42:55.2270920Z perl-URI-5.09-1.amzn2023.0.2.noarch 2025-05-07T19:42:55.2271472Z perl-base-2.27-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.2272024Z perl-constant-1.33-459.amzn2023.0.2.noarch 2025-05-07T19:42:55.2272590Z perl-if-0.60.800-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.2273373Z perl-interpreter-4:5.32.1-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.2273912Z perl-lib-0.65-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.2274472Z perl-libnet-3.13-2.amzn2023.0.2.noarch 2025-05-07T19:42:55.2275017Z perl-libs-4:5.32.1-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.2275556Z perl-mro-1.23-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.2276103Z perl-overload-1.31-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.2276700Z perl-overloading-0.02-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.2277285Z perl-parent-1:0.238-458.amzn2023.0.2.noarch 2025-05-07T19:42:55.2277835Z perl-podlators-1:4.14-458.amzn2023.0.2.noarch 2025-05-07T19:42:55.2278409Z perl-subs-1.03-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.2279077Z perl-vars-1.05-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.2279723Z shadow-utils-2:4.9-12.amzn2023.0.4.x86_64 2025-05-07T19:42:55.2280204Z sudo-1.9.15-1.p5.amzn2023.0.1.x86_64 2025-05-07T19:42:55.2280717Z sudo-python-plugin-1.9.15-1.p5.amzn2023.0.1.x86_64 2025-05-07T19:42:55.2281250Z systemd-libs-252.23-3.amzn2023.x86_64 2025-05-07T19:42:55.2281727Z tar-2:1.34-1.amzn2023.0.4.x86_64 2025-05-07T19:42:55.2282209Z util-linux-2.37.4-1.amzn2023.0.4.x86_64 2025-05-07T19:42:55.2282719Z util-linux-core-2.37.4-1.amzn2023.0.4.x86_64 2025-05-07T19:42:55.2283227Z wget-1.21.3-1.amzn2023.0.4.x86_64 2025-05-07T19:42:55.2283732Z which-2.21-26.amzn2023.0.2.x86_64 2025-05-07T19:42:55.2284023Z 2025-05-07T19:42:55.2284105Z Complete! 2025-05-07T19:42:55.2941420Z ##[group]Run actions/checkout@v4 2025-05-07T19:42:55.2941751Z with: 2025-05-07T19:42:55.2941975Z submodules: true 2025-05-07T19:42:55.2942218Z repository: pytorch/FBGEMM 2025-05-07T19:42:55.2942685Z token: *** 2025-05-07T19:42:55.2942893Z ssh-strict: true 2025-05-07T19:42:55.2943127Z ssh-user: git 2025-05-07T19:42:55.2943354Z persist-credentials: true 2025-05-07T19:42:55.2943626Z clean: true 2025-05-07T19:42:55.2943875Z sparse-checkout-cone-mode: true 2025-05-07T19:42:55.2944160Z fetch-depth: 1 2025-05-07T19:42:55.2944390Z fetch-tags: false 2025-05-07T19:42:55.2944614Z show-progress: true 2025-05-07T19:42:55.2944853Z lfs: false 2025-05-07T19:42:55.2945065Z set-safe-directory: true 2025-05-07T19:42:55.2945521Z env: 2025-05-07T19:42:55.2945740Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:42:55.2946067Z BUILD_ENV: build_binary 2025-05-07T19:42:55.2946314Z BUILD_TARGET: genai 2025-05-07T19:42:55.2946559Z BUILD_VARIANT: cuda 2025-05-07T19:42:55.2946858Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:42:55.2947112Z ##[endgroup] 2025-05-07T19:42:55.2987943Z ##[command]/usr/bin/docker exec 684dd9e27594bd891e6be6ff1899ad1c0eb766565fb4b8e7ad6703697bf1bd21 sh -c "cat /etc/*release | grep ^ID" 2025-05-07T19:42:55.6128588Z Syncing repository: pytorch/FBGEMM 2025-05-07T19:42:55.6130016Z ##[group]Getting Git version info 2025-05-07T19:42:55.6130428Z Working directory is '/__w/FBGEMM/FBGEMM' 2025-05-07T19:42:55.6130956Z [command]/usr/bin/git version 2025-05-07T19:42:55.6131280Z git version 2.47.1 2025-05-07T19:42:55.6133041Z ##[endgroup] 2025-05-07T19:42:55.6144387Z Temporarily overriding HOME='/__w/_temp/fd968ade-af19-414a-94a6-380ce17d7d80' before making global git config changes 2025-05-07T19:42:55.6145249Z Adding repository directory to the temporary git global config as a safe directory 2025-05-07T19:42:55.6159831Z [command]/usr/bin/git config --global --add safe.directory /__w/FBGEMM/FBGEMM 2025-05-07T19:42:55.6199243Z [command]/usr/bin/git config --local --get remote.origin.url 2025-05-07T19:42:55.6219882Z https://github.com/pytorch/FBGEMM 2025-05-07T19:42:55.6232844Z ##[group]Removing previously created refs, to avoid conflicts 2025-05-07T19:42:55.6236491Z [command]/usr/bin/git rev-parse --symbolic-full-name --verify --quiet HEAD 2025-05-07T19:42:55.6256773Z HEAD 2025-05-07T19:42:55.6287420Z ##[endgroup] 2025-05-07T19:42:55.6287698Z [command]/usr/bin/git submodule status 2025-05-07T19:42:55.6617370Z e5d7c0bd5d9aec44d68830187138149e6a8c4e32 external/asmjit (e5d7c0b) 2025-05-07T19:42:55.6687810Z 4a61bdd4bd4ed730e078aebc7c0fcf046ff29406 external/composable_kernel (4a61bdd) 2025-05-07T19:42:55.6756370Z 6543fec09b2f04ac4a666882998b534afc9c1349 external/cpuinfo (6543fec) 2025-05-07T19:42:55.6826275Z 3ed8d2ec4ba35ef5d9d8353826209b6f868f63d3 external/cutlass (3ed8d2e) 2025-05-07T19:42:55.6885137Z f8d7d77c06936315286eb55f8de22cd23c188571 external/googletest (f8d7d77) 2025-05-07T19:42:55.6960536Z 420084499c7c1e1c2d801922f40df202eac5f3a0 external/hipify_torch (4200844) 2025-05-07T19:42:55.7022893Z 9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03 external/json (9cca280) 2025-05-07T19:42:55.7031082Z ##[group]Cleaning the repository 2025-05-07T19:42:55.7032061Z [command]/usr/bin/git clean -ffdx 2025-05-07T19:42:55.7595157Z Removing amdgpu-install_6.2.60204-1_all.deb 2025-05-07T19:42:55.7596287Z Removing collect_env.py 2025-05-07T19:42:55.7597035Z Removing fbgemm_gpu/_skbuild/ 2025-05-07T19:42:55.7598147Z Removing fbgemm_gpu/bench/verify_fp16_stochastic_benchmark.hip 2025-05-07T19:42:55.7599454Z Removing fbgemm_gpu/codegen/genscript/__pycache__/ 2025-05-07T19:42:55.7601106Z Removing fbgemm_gpu/codegen/inference/embedding_forward_quantized_cpu_template_hip.cpp 2025-05-07T19:42:55.7602638Z Removing fbgemm_gpu/codegen/inference/embedding_forward_quantized_host_cpu_hip.cpp 2025-05-07T19:42:55.7603306Z Removing fbgemm_gpu/codegen/inference/embedding_forward_quantized_host_hip.cpp 2025-05-07T19:42:55.7604295Z Removing fbgemm_gpu/codegen/inference/embedding_forward_quantized_split_lookup.hip 2025-05-07T19:42:55.7605037Z Removing fbgemm_gpu/codegen/inference/embedding_forward_quantized_split_nbit_host_template.hip 2025-05-07T19:42:55.7605851Z Removing fbgemm_gpu/codegen/inference/embedding_forward_quantized_split_nbit_kernel_template.hip 2025-05-07T19:42:55.7606611Z Removing fbgemm_gpu/codegen/training/backward/embedding_backward_dense_host_cpu_hip.cpp 2025-05-07T19:42:55.7607499Z Removing fbgemm_gpu/codegen/training/backward/embedding_backward_split_cpu_approx_template_hip.cpp 2025-05-07T19:42:55.7608453Z Removing fbgemm_gpu/codegen/training/backward/embedding_backward_split_cpu_template_hip.cpp 2025-05-07T19:42:55.7609255Z Removing fbgemm_gpu/codegen/training/backward/embedding_backward_split_device_kernel_template_hip.cuh 2025-05-07T19:42:55.7610205Z Removing fbgemm_gpu/codegen/training/backward/embedding_backward_split_grad_template.hip 2025-05-07T19:42:55.7610978Z Removing fbgemm_gpu/codegen/training/backward/embedding_backward_split_host_cpu_template_hip.cpp 2025-05-07T19:42:55.7611803Z Removing fbgemm_gpu/codegen/training/backward/embedding_backward_split_host_template_hip.cpp 2025-05-07T19:42:55.7612625Z Removing fbgemm_gpu/codegen/training/backward/embedding_backward_split_indice_weights_template.hip 2025-05-07T19:42:55.7613430Z Removing fbgemm_gpu/codegen/training/backward/embedding_backward_split_kernel_cta_template.hip 2025-05-07T19:42:55.7614246Z Removing fbgemm_gpu/codegen/training/backward/embedding_backward_split_kernel_warp_template.hip 2025-05-07T19:42:55.7615022Z Removing fbgemm_gpu/codegen/training/backward/embedding_backward_split_meta_template_hip.cpp 2025-05-07T19:42:55.7615765Z Removing fbgemm_gpu/codegen/training/backward/embedding_backward_split_template.hip 2025-05-07T19:42:55.7616460Z Removing fbgemm_gpu/codegen/training/forward/embedding_forward_split_cpu_hip.cpp 2025-05-07T19:42:55.7617343Z Removing fbgemm_gpu/codegen/training/forward/embedding_forward_split_kernel_nobag_small_template.hip 2025-05-07T19:42:55.7618449Z Removing fbgemm_gpu/codegen/training/forward/embedding_forward_split_kernel_template.hip 2025-05-07T19:42:55.7619210Z Removing fbgemm_gpu/codegen/training/forward/embedding_forward_split_kernel_v2_template.hip 2025-05-07T19:42:55.7619968Z Removing fbgemm_gpu/codegen/training/forward/embedding_forward_split_template.hip 2025-05-07T19:42:55.7620686Z Removing fbgemm_gpu/codegen/training/index_select/batch_index_select_dim0_cpu_host_hip.cpp 2025-05-07T19:42:55.7621389Z Removing fbgemm_gpu/codegen/training/index_select/batch_index_select_dim0_ops_hip.cpp 2025-05-07T19:42:55.7622194Z Removing fbgemm_gpu/codegen/training/optimizer/embedding_optimizer_split_device_kernel_template_hip.cuh 2025-05-07T19:42:55.7623039Z Removing fbgemm_gpu/codegen/training/optimizer/embedding_optimizer_split_host_template_hip.cpp 2025-05-07T19:42:55.7623991Z Removing fbgemm_gpu/codegen/training/optimizer/embedding_optimizer_split_kernel_template.hip 2025-05-07T19:42:55.7624847Z Removing fbgemm_gpu/codegen/training/optimizer/embedding_optimizer_split_template.hip 2025-05-07T19:42:55.7625545Z Removing fbgemm_gpu/codegen/training/pt2/embedding_split_host_pt2_autograd_template_hip.cpp 2025-05-07T19:42:55.7626292Z Removing fbgemm_gpu/codegen/training/pt2/embedding_split_host_pt2_cpu_wrapper_template_hip.cpp 2025-05-07T19:42:55.7627012Z Removing fbgemm_gpu/codegen/training/pt2/embedding_split_host_pt2_hip_wrapper_template.cpp 2025-05-07T19:42:55.7627854Z Removing fbgemm_gpu/codegen/utils/embedding_bounds_check_host_cpu_hip.cpp 2025-05-07T19:42:55.7628420Z Removing fbgemm_gpu/codegen/utils/embedding_bounds_check_host_hip.cpp 2025-05-07T19:42:55.7628966Z Removing fbgemm_gpu/codegen/utils/embedding_bounds_check_v1.hip 2025-05-07T19:42:55.7629483Z Removing fbgemm_gpu/codegen/utils/embedding_bounds_check_v2.hip 2025-05-07T19:42:55.7629891Z Removing fbgemm_gpu/dist/ 2025-05-07T19:42:55.7630286Z Removing fbgemm_gpu/experimental/example/src/cutlass_sgemm_nn.hip 2025-05-07T19:42:55.7630807Z Removing fbgemm_gpu/experimental/example/src/example_nccl_hip.cpp 2025-05-07T19:42:55.7631509Z Removing fbgemm_gpu/experimental/gen_ai/src/attention/gqa_attn_splitk.hip 2025-05-07T19:42:55.7632065Z Removing fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.hip 2025-05-07T19:42:55.7632574Z Removing fbgemm_gpu/experimental/gen_ai/src/comm/car.hip 2025-05-07T19:42:55.7633052Z Removing fbgemm_gpu/experimental/gen_ai/src/comm/car_hip.cpp 2025-05-07T19:42:55.7633701Z Removing fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.hip 2025-05-07T19:42:55.7634269Z Removing fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.hip 2025-05-07T19:42:55.7634772Z Removing fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache_hip.cpp 2025-05-07T19:42:55.7635359Z Removing fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.hip 2025-05-07T19:42:55.7636153Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_common_hip.h 2025-05-07T19:42:55.7637255Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_common_hip.h 2025-05-07T19:42:55.7638228Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_common_hip.h 2025-05-07T19:42:55.7670893Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common_hip.h 2025-05-07T19:42:55.7671871Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fused_moe/fused_moe_op_hip.cpp 2025-05-07T19:42:55.7672617Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cublas_utils_hip.h 2025-05-07T19:42:55.7673331Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.hip 2025-05-07T19:42:55.7674120Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16.hip 2025-05-07T19:42:55.7674958Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.hip 2025-05-07T19:42:55.7675955Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.hip 2025-05-07T19:42:55.7676747Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16.hip 2025-05-07T19:42:55.7677626Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.hip 2025-05-07T19:42:55.7678498Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.hip 2025-05-07T19:42:55.7679367Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.hip 2025-05-07T19:42:55.7680198Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.hip 2025-05-07T19:42:55.7681061Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.hip 2025-05-07T19:42:55.7681899Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.hip 2025-05-07T19:42:55.7682770Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.hip 2025-05-07T19:42:55.7683622Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.hip 2025-05-07T19:42:55.7684457Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.hip 2025-05-07T19:42:55.7685318Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.hip 2025-05-07T19:42:55.7686179Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.hip 2025-05-07T19:42:55.7687010Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.hip 2025-05-07T19:42:55.7687877Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.hip 2025-05-07T19:42:55.7688929Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.hip 2025-05-07T19:42:55.7689797Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.hip 2025-05-07T19:42:55.7690661Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.hip 2025-05-07T19:42:55.7691502Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.hip 2025-05-07T19:42:55.7692359Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.hip 2025-05-07T19:42:55.7693200Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.hip 2025-05-07T19:42:55.7694150Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.hip 2025-05-07T19:42:55.7694999Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.hip 2025-05-07T19:42:55.7695839Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.hip 2025-05-07T19:42:55.7696690Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.hip 2025-05-07T19:42:55.7697830Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.hip 2025-05-07T19:42:55.7698727Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common_hip.cuh 2025-05-07T19:42:55.7699627Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_manifest_hip.cuh 2025-05-07T19:42:55.7700424Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16.hip 2025-05-07T19:42:55.7701194Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_blockwise.hip 2025-05-07T19:42:55.7701959Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_cublas.hip 2025-05-07T19:42:55.7702740Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_lite.hip 2025-05-07T19:42:55.7703516Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise.hip 2025-05-07T19:42:55.7704507Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.hip 2025-05-07T19:42:55.7705550Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.hip 2025-05-07T19:42:55.7706562Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.hip 2025-05-07T19:42:55.7707602Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.hip 2025-05-07T19:42:55.7708645Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.hip 2025-05-07T19:42:55.7709643Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.hip 2025-05-07T19:42:55.7710673Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.hip 2025-05-07T19:42:55.7711698Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.hip 2025-05-07T19:42:55.7712690Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.hip 2025-05-07T19:42:55.7713687Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_common_hip.cuh 2025-05-07T19:42:55.7714600Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/common_hip.cuh 2025-05-07T19:42:55.7715801Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.hip 2025-05-07T19:42:55.7717086Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.hip 2025-05-07T19:42:55.7718330Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.hip 2025-05-07T19:42:55.7719362Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.hip 2025-05-07T19:42:55.7720388Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.hip 2025-05-07T19:42:55.7721335Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.hip 2025-05-07T19:42:55.7722116Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.hip 2025-05-07T19:42:55.7722834Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_rowwise.hip 2025-05-07T19:42:55.7723577Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled.hip 2025-05-07T19:42:55.7724354Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.hip 2025-05-07T19:42:55.7725070Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16.hip 2025-05-07T19:42:55.7725775Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16_dynamic.hip 2025-05-07T19:42:55.7726587Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/include/fp8_blockwise_cutlass_helpers_hip.h 2025-05-07T19:42:55.7727430Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/mixed_dtype_utils.hip 2025-05-07T19:42:55.7728130Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16_fast_gemv.hip 2025-05-07T19:42:55.7728780Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.hip 2025-05-07T19:42:55.7729470Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.hip 2025-05-07T19:42:55.7730116Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv.hip 2025-05-07T19:42:55.7730796Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv_hip.cuh 2025-05-07T19:42:55.7731457Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/utility_hip.cuh 2025-05-07T19:42:55.7732063Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.hip 2025-05-07T19:42:55.7732608Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/quantize_hip.cpp 2025-05-07T19:42:55.7733067Z Removing fbgemm_gpu/fbgemm_gpu/docs/version.py 2025-05-07T19:42:55.7733482Z Removing fbgemm_gpu/fbgemm_gpu_nightly.egg-info/ 2025-05-07T19:42:55.7733893Z Removing fbgemm_gpu/include/fbgemm_gpu/cumem_utils_hip.h 2025-05-07T19:42:55.7734450Z Removing fbgemm_gpu/include/fbgemm_gpu/embedding_backward_template_helpers_hip.cuh 2025-05-07T19:42:55.7735049Z Removing fbgemm_gpu/include/fbgemm_gpu/embedding_forward_split_cpu_hip.h 2025-05-07T19:42:55.7735661Z Removing fbgemm_gpu/include/fbgemm_gpu/embedding_forward_template_helpers_hip.cuh 2025-05-07T19:42:55.7736246Z Removing fbgemm_gpu/include/fbgemm_gpu/layout_transform_ops_hip.cuh 2025-05-07T19:42:55.7736794Z Removing fbgemm_gpu/include/fbgemm_gpu/permute_multi_embedding_function_hip.h 2025-05-07T19:42:55.7737601Z Removing fbgemm_gpu/include/fbgemm_gpu/quantize_ops_hip.cuh 2025-05-07T19:42:55.7738130Z Removing fbgemm_gpu/include/fbgemm_gpu/sparse_ops_hip.cuh 2025-05-07T19:42:55.7738670Z Removing fbgemm_gpu/include/fbgemm_gpu/split_embeddings_utils_hip.cuh 2025-05-07T19:42:55.7739266Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/barrier_isolation_hip.cuh 2025-05-07T19:42:55.7739797Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/bench_utils_hip.cuh 2025-05-07T19:42:55.7740456Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/bitonic_sort_hip.cuh 2025-05-07T19:42:55.7741028Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/cub_namespace_postfix_hip.cuh 2025-05-07T19:42:55.7741664Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/cub_namespace_prefix_hip.cuh 2025-05-07T19:42:55.7742262Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/device_cache_flusher_hip.cuh 2025-05-07T19:42:55.7742875Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/device_properties_hip.cuh 2025-05-07T19:42:55.7743461Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/dispatch_macros_hip.h 2025-05-07T19:42:55.7744068Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/embedding_bounds_check_common_hip.cuh 2025-05-07T19:42:55.7744706Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/find_qparams_hip.cuh 2025-05-07T19:42:55.7745284Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/float_hip.cuh 2025-05-07T19:42:55.7745801Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/hip_prelude.cuh 2025-05-07T19:42:55.7746355Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/host_device_buffer_pair_hip.cuh 2025-05-07T19:42:55.7746991Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/inclusive_sum_scan_hip.cuh 2025-05-07T19:42:55.7747589Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/kernel_launcher_hip.cuh 2025-05-07T19:42:55.7748181Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/rocm/stochastic_rounding_hip.h 2025-05-07T19:42:55.7748767Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/rocm/vec2_hip.h 2025-05-07T19:42:55.7749280Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/rocm/weight_row_hip.h 2025-05-07T19:42:55.7749960Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/shared_memory_hip.cuh 2025-05-07T19:42:55.7750512Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding_hip.cuh 2025-05-07T19:42:55.7751238Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/tensor_accessor_builder_hip.h 2025-05-07T19:42:55.7751777Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/tensor_accessor_hip.h 2025-05-07T19:42:55.7752246Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/vec4_hip.cuh 2025-05-07T19:42:55.7752719Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/vec4acc_hip.cuh 2025-05-07T19:42:55.7753180Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/vec_quant_hip.cuh 2025-05-07T19:42:55.7753652Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/vecn_hip.cuh 2025-05-07T19:42:55.7754108Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/weight_row_hip.cuh 2025-05-07T19:42:55.7754667Z Removing fbgemm_gpu/src/dram_kv_embedding_cache/dram_kv_embedding_cache_hip.h 2025-05-07T19:42:55.7755293Z Removing fbgemm_gpu/src/dram_kv_embedding_cache/dram_kv_embedding_cache_wrapper_hip.h 2025-05-07T19:42:55.7755896Z Removing fbgemm_gpu/src/embedding_inplace_ops/embedding_inplace_update.hip 2025-05-07T19:42:55.7756525Z Removing fbgemm_gpu/src/embedding_inplace_ops/embedding_inplace_update_gpu_hip.cpp 2025-05-07T19:42:55.7757081Z Removing fbgemm_gpu/src/histogram_binning_calibration_ops.hip 2025-05-07T19:42:55.7757565Z Removing fbgemm_gpu/src/input_combine_ops/input_combine.hip 2025-05-07T19:42:55.7758072Z Removing fbgemm_gpu/src/input_combine_ops/input_combine_cpu_hip.cpp 2025-05-07T19:42:55.7758667Z Removing fbgemm_gpu/src/intraining_embedding_pruning_ops/intraining_embedding_pruning.hip 2025-05-07T19:42:55.7759409Z Removing fbgemm_gpu/src/intraining_embedding_pruning_ops/intraining_embedding_pruning_gpu_hip.cpp 2025-05-07T19:42:55.7760103Z Removing fbgemm_gpu/src/jagged_tensor_ops/batched_dense_vec_jagged_2d_mul_backward.hip 2025-05-07T19:42:55.7760774Z Removing fbgemm_gpu/src/jagged_tensor_ops/batched_dense_vec_jagged_2d_mul_forward.hip 2025-05-07T19:42:55.7761315Z Removing fbgemm_gpu/src/jagged_tensor_ops/common_hip.cuh 2025-05-07T19:42:55.7761817Z Removing fbgemm_gpu/src/jagged_tensor_ops/dense_to_jagged_forward.hip 2025-05-07T19:42:55.7762378Z Removing fbgemm_gpu/src/jagged_tensor_ops/jagged_dense_bmm_forward.hip 2025-05-07T19:42:55.7763037Z Removing fbgemm_gpu/src/jagged_tensor_ops/jagged_dense_dense_elementwise_add_jagged_output_forward.hip 2025-05-07T19:42:55.7763771Z Removing fbgemm_gpu/src/jagged_tensor_ops/jagged_dense_elementwise_mul_backward.hip 2025-05-07T19:42:55.7764511Z Removing fbgemm_gpu/src/jagged_tensor_ops/jagged_dense_elementwise_mul_forward.hip 2025-05-07T19:42:55.7765130Z Removing fbgemm_gpu/src/jagged_tensor_ops/jagged_index_add_2d_forward.hip 2025-05-07T19:42:55.7765714Z Removing fbgemm_gpu/src/jagged_tensor_ops/jagged_index_select_2d_forward.hip 2025-05-07T19:42:55.7766271Z Removing fbgemm_gpu/src/jagged_tensor_ops/jagged_jagged_bmm_forward.hip 2025-05-07T19:42:55.7766823Z Removing fbgemm_gpu/src/jagged_tensor_ops/jagged_softmax_backward.hip 2025-05-07T19:42:55.7767495Z Removing fbgemm_gpu/src/jagged_tensor_ops/jagged_softmax_forward.hip 2025-05-07T19:42:55.7768347Z Removing fbgemm_gpu/src/jagged_tensor_ops/jagged_tensor_ops.hip 2025-05-07T19:42:55.7768886Z Removing fbgemm_gpu/src/jagged_tensor_ops/jagged_tensor_ops_cpu_hip.cpp 2025-05-07T19:42:55.7769630Z Removing fbgemm_gpu/src/jagged_tensor_ops/jagged_to_padded_dense_backward.hip 2025-05-07T19:42:55.7770251Z Removing fbgemm_gpu/src/jagged_tensor_ops/jagged_to_padded_dense_forward.hip 2025-05-07T19:42:55.7770834Z Removing fbgemm_gpu/src/jagged_tensor_ops/jagged_unique_indices.hip 2025-05-07T19:42:55.7771438Z Removing fbgemm_gpu/src/jagged_tensor_ops/keyed_jagged_index_select_dim1.hip 2025-05-07T19:42:55.7772030Z Removing fbgemm_gpu/src/layout_transform_ops/layout_transform_ops.hip 2025-05-07T19:42:55.7772649Z Removing fbgemm_gpu/src/layout_transform_ops/layout_transform_ops_cpu_hip.cpp 2025-05-07T19:42:55.7773197Z Removing fbgemm_gpu/src/memory_utils/common_hip.cuh 2025-05-07T19:42:55.7773622Z Removing fbgemm_gpu/src/memory_utils/memory_utils.hip 2025-05-07T19:42:55.7774099Z Removing fbgemm_gpu/src/memory_utils/memory_utils_hip.cpp 2025-05-07T19:42:55.7774562Z Removing fbgemm_gpu/src/memory_utils/memory_utils_ops.hip 2025-05-07T19:42:55.7775071Z Removing fbgemm_gpu/src/memory_utils/memory_utils_ops_hip.cpp 2025-05-07T19:42:55.7775669Z Removing fbgemm_gpu/src/merge_pooled_embedding_ops/merge_pooled_embedding_ops_cpu_hip.cpp 2025-05-07T19:42:55.7776415Z Removing fbgemm_gpu/src/merge_pooled_embedding_ops/merge_pooled_embedding_ops_gpu_hip.cpp 2025-05-07T19:42:55.7777107Z Removing fbgemm_gpu/src/metric_ops/metric_ops.hip 2025-05-07T19:42:55.7777697Z Removing fbgemm_gpu/src/permute_multi_embedding_ops/permute_multi_embedding_function_hip.cpp 2025-05-07T19:42:55.7778446Z Removing fbgemm_gpu/src/permute_multi_embedding_ops/permute_multi_embedding_ops.hip 2025-05-07T19:42:55.7779157Z Removing fbgemm_gpu/src/permute_multi_embedding_ops/permute_multi_embedding_ops_cpu_hip.cpp 2025-05-07T19:42:55.7779910Z Removing fbgemm_gpu/src/permute_pooled_embedding_ops/permute_pooled_embedding_ops.hip 2025-05-07T19:42:55.7780640Z Removing fbgemm_gpu/src/permute_pooled_embedding_ops/permute_pooled_embedding_ops_cpu_hip.cpp 2025-05-07T19:42:55.7781409Z Removing fbgemm_gpu/src/permute_pooled_embedding_ops/permute_pooled_embedding_ops_split.hip 2025-05-07T19:42:55.7782184Z Removing fbgemm_gpu/src/ps_split_embeddings_cache/ps_split_table_batched_embeddings_hip.cpp 2025-05-07T19:42:55.7782890Z Removing fbgemm_gpu/src/ps_split_embeddings_cache/ps_table_batched_embeddings_hip.h 2025-05-07T19:42:55.7783441Z Removing fbgemm_gpu/src/quantize_ops/common_hip.cuh 2025-05-07T19:42:55.7783885Z Removing fbgemm_gpu/src/quantize_ops/mx/common_hip.cuh 2025-05-07T19:42:55.7784313Z Removing fbgemm_gpu/src/quantize_ops/mx_common_hip.cuh 2025-05-07T19:42:55.7784790Z Removing fbgemm_gpu/src/quantize_ops/quantize_bfloat16.hip 2025-05-07T19:42:55.7785276Z Removing fbgemm_gpu/src/quantize_ops/quantize_fp8_rowwise.hip 2025-05-07T19:42:55.7785818Z Removing fbgemm_gpu/src/quantize_ops/quantize_fused_8bit_rowwise.hip 2025-05-07T19:42:55.7786368Z Removing fbgemm_gpu/src/quantize_ops/quantize_fused_nbit_rowwise.hip 2025-05-07T19:42:55.7786892Z Removing fbgemm_gpu/src/quantize_ops/quantize_hfp8.hip 2025-05-07T19:42:55.7787362Z Removing fbgemm_gpu/src/quantize_ops/quantize_msfp.hip 2025-05-07T19:42:55.7787790Z Removing fbgemm_gpu/src/quantize_ops/quantize_mx.hip 2025-05-07T19:42:55.7788260Z Removing fbgemm_gpu/src/quantize_ops/quantize_mx_hip.cuh 2025-05-07T19:42:55.7788832Z Removing fbgemm_gpu/src/quantize_ops/quantize_ops_cpu_hip.cpp 2025-05-07T19:42:55.7789498Z Removing fbgemm_gpu/src/quantize_ops/quantize_padded_fp8_rowwise.hip 2025-05-07T19:42:55.7789938Z Removing fbgemm_gpu/src/sparse_ops/common_hip.cuh 2025-05-07T19:42:55.7790404Z Removing fbgemm_gpu/src/sparse_ops/sparse_async_batched_cumsum.hip 2025-05-07T19:42:55.7790943Z Removing fbgemm_gpu/src/sparse_ops/sparse_async_batched_cumsum_hip.cpp 2025-05-07T19:42:55.7791419Z Removing fbgemm_gpu/src/sparse_ops/sparse_async_cumsum.hip 2025-05-07T19:42:55.7791900Z Removing fbgemm_gpu/src/sparse_ops/sparse_async_cumsum_hip.cpp 2025-05-07T19:42:55.7792389Z Removing fbgemm_gpu/src/sparse_ops/sparse_batched_unary_embeddings.hip 2025-05-07T19:42:55.7793017Z Removing fbgemm_gpu/src/sparse_ops/sparse_block_bucketize_features.hip 2025-05-07T19:42:55.7793538Z Removing fbgemm_gpu/src/sparse_ops/sparse_bucketize_features.hip 2025-05-07T19:42:55.7794055Z Removing fbgemm_gpu/src/sparse_ops/sparse_compute_frequency_sequence.hip 2025-05-07T19:42:55.7794631Z Removing fbgemm_gpu/src/sparse_ops/sparse_expand_into_jagged_permute.hip 2025-05-07T19:42:55.7795123Z Removing fbgemm_gpu/src/sparse_ops/sparse_group_index.hip 2025-05-07T19:42:55.7795565Z Removing fbgemm_gpu/src/sparse_ops/sparse_index_add.hip 2025-05-07T19:42:55.7795986Z Removing fbgemm_gpu/src/sparse_ops/sparse_index_select.hip 2025-05-07T19:42:55.7796447Z Removing fbgemm_gpu/src/sparse_ops/sparse_invert_permute.hip 2025-05-07T19:42:55.7796911Z Removing fbgemm_gpu/src/sparse_ops/sparse_ops_cpu_hip.cpp 2025-05-07T19:42:55.7797379Z Removing fbgemm_gpu/src/sparse_ops/sparse_pack_segments_backward.hip 2025-05-07T19:42:55.7797900Z Removing fbgemm_gpu/src/sparse_ops/sparse_pack_segments_forward.hip 2025-05-07T19:42:55.7798375Z Removing fbgemm_gpu/src/sparse_ops/sparse_permute102.hip 2025-05-07T19:42:55.7798826Z Removing fbgemm_gpu/src/sparse_ops/sparse_permute_1d.hip 2025-05-07T19:42:55.7799250Z Removing fbgemm_gpu/src/sparse_ops/sparse_permute_2d.hip 2025-05-07T19:42:55.7799733Z Removing fbgemm_gpu/src/sparse_ops/sparse_permute_embeddings.hip 2025-05-07T19:42:55.7800167Z Removing fbgemm_gpu/src/sparse_ops/sparse_range.hip 2025-05-07T19:42:55.7800661Z Removing fbgemm_gpu/src/sparse_ops/sparse_reorder_batched_ad.hip 2025-05-07T19:42:55.7801150Z Removing fbgemm_gpu/src/sparse_ops/sparse_segment_sum_csr.hip 2025-05-07T19:42:55.7801573Z Removing fbgemm_gpu/src/sparse_ops/sparse_zipf.hip 2025-05-07T19:42:55.7802043Z Removing fbgemm_gpu/src/split_embeddings_cache/cachelib_cache_hip.cpp 2025-05-07T19:42:55.7802560Z Removing fbgemm_gpu/src/split_embeddings_cache/common_hip.cuh 2025-05-07T19:42:55.7803014Z Removing fbgemm_gpu/src/split_embeddings_cache/common_hip.h 2025-05-07T19:42:55.7803502Z Removing fbgemm_gpu/src/split_embeddings_cache/lfu_cache_find.hip 2025-05-07T19:42:55.7804010Z Removing fbgemm_gpu/src/split_embeddings_cache/lfu_cache_populate.hip 2025-05-07T19:42:55.7804583Z Removing fbgemm_gpu/src/split_embeddings_cache/lfu_cache_populate_byte.hip 2025-05-07T19:42:55.7805163Z Removing fbgemm_gpu/src/split_embeddings_cache/lfu_cache_populate_byte_hip.cpp 2025-05-07T19:42:55.7805760Z Removing fbgemm_gpu/src/split_embeddings_cache/linearize_cache_indices.hip 2025-05-07T19:42:55.7806442Z Removing fbgemm_gpu/src/split_embeddings_cache/linearize_cache_indices_hip.cpp 2025-05-07T19:42:55.7806976Z Removing fbgemm_gpu/src/split_embeddings_cache/lru_cache_find.hip 2025-05-07T19:42:55.7807490Z Removing fbgemm_gpu/src/split_embeddings_cache/lru_cache_populate.hip 2025-05-07T19:42:55.7808018Z Removing fbgemm_gpu/src/split_embeddings_cache/lru_cache_populate_byte.hip 2025-05-07T19:42:55.7808610Z Removing fbgemm_gpu/src/split_embeddings_cache/lru_cache_populate_byte_hip.cpp 2025-05-07T19:42:55.7809131Z Removing fbgemm_gpu/src/split_embeddings_cache/lxu_cache.hip 2025-05-07T19:42:55.7809626Z Removing fbgemm_gpu/src/split_embeddings_cache/lxu_cache_hip.cpp 2025-05-07T19:42:55.7810162Z Removing fbgemm_gpu/src/split_embeddings_cache/reset_weight_momentum.hip 2025-05-07T19:42:55.7810810Z Removing fbgemm_gpu/src/split_embeddings_cache/split_embeddings_cache_ops.hip 2025-05-07T19:42:55.7811445Z Removing fbgemm_gpu/src/split_embeddings_cache/split_embeddings_cache_ops_hip.cpp 2025-05-07T19:42:55.7812032Z Removing fbgemm_gpu/src/split_embeddings_utils/generate_vbe_metadata.hip 2025-05-07T19:42:55.7812600Z Removing fbgemm_gpu/src/split_embeddings_utils/get_infos_metadata.hip 2025-05-07T19:42:55.7813147Z Removing fbgemm_gpu/src/split_embeddings_utils/radix_sort_pairs.hip 2025-05-07T19:42:55.7813698Z Removing fbgemm_gpu/src/split_embeddings_utils/split_embeddings_utils_hip.cpp 2025-05-07T19:42:55.7814312Z Removing fbgemm_gpu/src/split_embeddings_utils/transpose_embedding_input.hip 2025-05-07T19:42:55.7814912Z Removing fbgemm_gpu/src/ssd_split_embeddings_cache/embedding_rocksdb_wrapper_hip.h 2025-05-07T19:42:55.7815576Z Removing fbgemm_gpu/src/ssd_split_embeddings_cache/kv_db_hip_utils.cpp 2025-05-07T19:42:55.7816091Z Removing fbgemm_gpu/src/ssd_split_embeddings_cache/kv_db_hip_utils.h 2025-05-07T19:42:55.7816698Z Removing fbgemm_gpu/src/ssd_split_embeddings_cache/kv_db_table_batched_embeddings_hip.cpp 2025-05-07T19:42:55.7817645Z Removing fbgemm_gpu/src/ssd_split_embeddings_cache/kv_db_table_batched_embeddings_hip.h 2025-05-07T19:42:55.7818518Z Removing fbgemm_gpu/src/ssd_split_embeddings_cache/kv_tensor_wrapper_cpu_hip.cpp 2025-05-07T19:42:55.7819213Z Removing fbgemm_gpu/src/ssd_split_embeddings_cache/ssd_scratch_pad_indices_queue_hip.cpp 2025-05-07T19:42:55.7819899Z Removing fbgemm_gpu/src/ssd_split_embeddings_cache/ssd_split_embeddings_cache_hip.hip 2025-05-07T19:42:55.7820637Z Removing fbgemm_gpu/src/ssd_split_embeddings_cache/ssd_split_table_batched_embeddings_hip.cpp 2025-05-07T19:42:55.7821372Z Removing fbgemm_gpu/src/ssd_split_embeddings_cache/ssd_table_batched_embeddings_hip.h 2025-05-07T19:42:55.7821917Z Removing fbgemm_gpu/src/topology_utils_hip.cpp 2025-05-07T19:42:55.7822360Z Removing fbgemm_gpu/test/tbe/utils/cpu_kernel_test_hip.cpp 2025-05-07T19:42:55.7822818Z Removing fbgemm_gpu/test/utils/kernel_launcher_test.hip 2025-05-07T19:42:55.7823307Z Removing fbgemm_gpu/test/utils/stochastic_rounding_test.hip 2025-05-07T19:42:55.7823773Z Removing fbgemm_gpu/test/utils/tensor_accessor2_test.hip 2025-05-07T19:42:55.7824269Z Removing fbgemm_gpu/test/utils/tensor_accessor_builder_test.hip 2025-05-07T19:42:55.7824863Z Removing fbgemm_gpu/test/utils/tensor_accessor_builder_with_memcheck_test.hip 2025-05-07T19:42:55.7825409Z Removing fbgemm_gpu/test/utils/tensor_accessor_test.hip 2025-05-07T19:42:55.7825920Z Removing fbgemm_gpu/test/utils/tensor_accessor_with_memcheck_test.hip 2025-05-07T19:42:55.7826401Z Removing fbgemm_gpu/test/utils/weight_row_test.hip 2025-05-07T19:42:55.7828880Z [command]/usr/bin/git reset --hard HEAD 2025-05-07T19:42:55.8725424Z HEAD is now at 1c9ad64 Merge f6528e7b1e8f5602e7dba30cd73b48ae6630981c into fd4df5f456e0cca514bacd98a39efb72990fd9f4 2025-05-07T19:42:55.8728904Z ##[endgroup] 2025-05-07T19:42:55.8730101Z ##[group]Disabling automatic garbage collection 2025-05-07T19:42:55.8732736Z [command]/usr/bin/git config --local gc.auto 0 2025-05-07T19:42:55.8761084Z ##[endgroup] 2025-05-07T19:42:55.8762172Z ##[group]Setting up auth 2025-05-07T19:42:55.8765218Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-05-07T19:42:55.8789620Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-05-07T19:42:55.9060018Z Entering 'external/asmjit' 2025-05-07T19:42:55.9121525Z Entering 'external/composable_kernel' 2025-05-07T19:42:55.9185200Z Entering 'external/cpuinfo' 2025-05-07T19:42:55.9246571Z Entering 'external/cutlass' 2025-05-07T19:42:55.9312826Z Entering 'external/googletest' 2025-05-07T19:42:55.9369639Z Entering 'external/hipify_torch' 2025-05-07T19:42:55.9422315Z Entering 'external/json' 2025-05-07T19:42:55.9495649Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-05-07T19:42:55.9520656Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-05-07T19:42:55.9815070Z Entering 'external/asmjit' 2025-05-07T19:42:55.9862861Z Entering 'external/composable_kernel' 2025-05-07T19:42:55.9925250Z Entering 'external/cpuinfo' 2025-05-07T19:42:55.9973376Z Entering 'external/cutlass' 2025-05-07T19:42:56.0036363Z Entering 'external/googletest' 2025-05-07T19:42:56.0087573Z Entering 'external/hipify_torch' 2025-05-07T19:42:56.0136658Z Entering 'external/json' 2025-05-07T19:42:56.0200401Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-05-07T19:42:56.0236583Z ##[endgroup] 2025-05-07T19:42:56.0237035Z ##[group]Fetching the repository 2025-05-07T19:42:56.0244303Z [command]/usr/bin/git -c protocol.version=2 fetch --no-tags --prune --no-recurse-submodules --depth=1 origin +a2f4c52051596e74bc8c16e3d2867a4ecdd271e0:refs/remotes/pull/4066/merge 2025-05-07T19:42:56.1635035Z From https://github.com/pytorch/FBGEMM 2025-05-07T19:42:56.1636811Z + 1c9ad64...a2f4c52 a2f4c52051596e74bc8c16e3d2867a4ecdd271e0 -> pull/4066/merge (forced update) 2025-05-07T19:42:56.1654061Z ##[endgroup] 2025-05-07T19:42:56.1655212Z ##[group]Determining the checkout info 2025-05-07T19:42:56.1656555Z ##[endgroup] 2025-05-07T19:42:56.1657872Z [command]/usr/bin/git sparse-checkout disable 2025-05-07T19:42:56.2149970Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig 2025-05-07T19:42:56.2175309Z ##[group]Checking out the ref 2025-05-07T19:42:56.2176249Z [command]/usr/bin/git checkout --progress --force refs/remotes/pull/4066/merge 2025-05-07T19:42:56.2249524Z Warning: you are leaving 1 commit behind, not connected to 2025-05-07T19:42:56.2250711Z any of your branches: 2025-05-07T19:42:56.2251201Z 2025-05-07T19:42:56.2251975Z 1c9ad64 Merge f6528e7b1e8f5602e7dba30cd73b48ae6630981c into fd4df5f456e0cca514bacd98a39efb72990fd9f4 2025-05-07T19:42:56.2252446Z 2025-05-07T19:42:56.2252691Z If you want to keep it by creating a new branch, this may be a good time 2025-05-07T19:42:56.2253156Z to do so with: 2025-05-07T19:42:56.2253297Z 2025-05-07T19:42:56.2253458Z git branch 1c9ad64 2025-05-07T19:42:56.2253675Z 2025-05-07T19:42:56.2254084Z HEAD is now at a2f4c52 Merge 6060cd4b5f971680caecdcc657faccb5720d1c3e into fd4df5f456e0cca514bacd98a39efb72990fd9f4 2025-05-07T19:42:56.2255318Z ##[endgroup] 2025-05-07T19:42:56.2255765Z ##[group]Setting up auth for fetching submodules 2025-05-07T19:42:56.2257223Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-05-07T19:42:56.2299852Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2025-05-07T19:42:56.2319406Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2025-05-07T19:42:56.2342095Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2025-05-07T19:42:56.2365919Z ##[endgroup] 2025-05-07T19:42:56.2367049Z ##[group]Fetching submodules 2025-05-07T19:42:56.2368357Z [command]/usr/bin/git submodule sync 2025-05-07T19:42:56.2659784Z Synchronizing submodule url for 'external/asmjit' 2025-05-07T19:42:56.2661027Z Synchronizing submodule url for 'external/composable_kernel' 2025-05-07T19:42:56.2661632Z Synchronizing submodule url for 'external/cpuinfo' 2025-05-07T19:42:56.2662074Z Synchronizing submodule url for 'external/cutlass' 2025-05-07T19:42:56.2662485Z Synchronizing submodule url for 'external/googletest' 2025-05-07T19:42:56.2662942Z Synchronizing submodule url for 'external/hipify_torch' 2025-05-07T19:42:56.2663366Z Synchronizing submodule url for 'external/json' 2025-05-07T19:42:56.2666594Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --depth=1 2025-05-07T19:42:56.3450767Z Submodule path 'external/asmjit': checked out 'e5d7c0bd5d9aec44d68830187138149e6a8c4e32' 2025-05-07T19:42:56.6090255Z Submodule path 'external/composable_kernel': checked out '4a61bdd4bd4ed730e078aebc7c0fcf046ff29406' 2025-05-07T19:42:56.7003390Z Submodule path 'external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349' 2025-05-07T19:42:57.3585272Z Submodule path 'external/cutlass': checked out '3ed8d2ec4ba35ef5d9d8353826209b6f868f63d3' 2025-05-07T19:42:57.3975174Z Submodule path 'external/googletest': checked out 'f8d7d77c06936315286eb55f8de22cd23c188571' 2025-05-07T19:42:57.4069036Z Submodule path 'external/hipify_torch': checked out '420084499c7c1e1c2d801922f40df202eac5f3a0' 2025-05-07T19:42:57.5108409Z Submodule path 'external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03' 2025-05-07T19:42:57.5128302Z [command]/usr/bin/git submodule foreach git config --local gc.auto 0 2025-05-07T19:42:57.5414599Z Entering 'external/asmjit' 2025-05-07T19:42:57.5438727Z Entering 'external/composable_kernel' 2025-05-07T19:42:57.5471909Z Entering 'external/cpuinfo' 2025-05-07T19:42:57.5499089Z Entering 'external/cutlass' 2025-05-07T19:42:57.5531828Z Entering 'external/googletest' 2025-05-07T19:42:57.5558533Z Entering 'external/hipify_torch' 2025-05-07T19:42:57.5593455Z Entering 'external/json' 2025-05-07T19:42:57.5643044Z ##[endgroup] 2025-05-07T19:42:57.5643545Z ##[group]Persisting credentials for submodules 2025-05-07T19:42:57.5646371Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :" 2025-05-07T19:42:57.5902598Z Entering 'external/asmjit' 2025-05-07T19:42:57.5938208Z url.https://github.com/.insteadof 2025-05-07T19:42:57.5938613Z url.https://github.com/.insteadof 2025-05-07T19:42:57.5967902Z Entering 'external/composable_kernel' 2025-05-07T19:42:57.5999872Z url.https://github.com/.insteadof 2025-05-07T19:42:57.6000419Z url.https://github.com/.insteadof 2025-05-07T19:42:57.6036990Z Entering 'external/cpuinfo' 2025-05-07T19:42:57.6079389Z url.https://github.com/.insteadof 2025-05-07T19:42:57.6080434Z url.https://github.com/.insteadof 2025-05-07T19:42:57.6113672Z Entering 'external/cutlass' 2025-05-07T19:42:57.6143208Z url.https://github.com/.insteadof 2025-05-07T19:42:57.6143600Z url.https://github.com/.insteadof 2025-05-07T19:42:57.6197665Z Entering 'external/googletest' 2025-05-07T19:42:57.6240468Z url.https://github.com/.insteadof 2025-05-07T19:42:57.6241491Z url.https://github.com/.insteadof 2025-05-07T19:42:57.6273307Z Entering 'external/hipify_torch' 2025-05-07T19:42:57.6317503Z url.https://github.com/.insteadof 2025-05-07T19:42:57.6318534Z url.https://github.com/.insteadof 2025-05-07T19:42:57.6351938Z Entering 'external/json' 2025-05-07T19:42:57.6389999Z url.https://github.com/.insteadof 2025-05-07T19:42:57.6391731Z url.https://github.com/.insteadof 2025-05-07T19:42:57.6441321Z [command]/usr/bin/git submodule foreach sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url" 2025-05-07T19:42:57.6748005Z Entering 'external/asmjit' 2025-05-07T19:42:57.6800078Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/asmjit/config remote.origin.url 2025-05-07T19:42:57.6802911Z Entering 'external/composable_kernel' 2025-05-07T19:42:57.6853280Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/composable_kernel/config remote.origin.url 2025-05-07T19:42:57.6854313Z Entering 'external/cpuinfo' 2025-05-07T19:42:57.6900380Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/cpuinfo/config remote.origin.url 2025-05-07T19:42:57.6901889Z Entering 'external/cutlass' 2025-05-07T19:42:57.6946975Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/cutlass/config remote.origin.url 2025-05-07T19:42:57.6955974Z Entering 'external/googletest' 2025-05-07T19:42:57.7002031Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/googletest/config remote.origin.url 2025-05-07T19:42:57.7005385Z Entering 'external/hipify_torch' 2025-05-07T19:42:57.7055243Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/hipify_torch/config remote.origin.url 2025-05-07T19:42:57.7056798Z Entering 'external/json' 2025-05-07T19:42:57.7100306Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/json/config remote.origin.url 2025-05-07T19:42:57.7171327Z [command]/usr/bin/git submodule foreach git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2025-05-07T19:42:57.7451358Z Entering 'external/asmjit' 2025-05-07T19:42:57.7471447Z Entering 'external/composable_kernel' 2025-05-07T19:42:57.7499093Z Entering 'external/cpuinfo' 2025-05-07T19:42:57.7524880Z Entering 'external/cutlass' 2025-05-07T19:42:57.7554070Z Entering 'external/googletest' 2025-05-07T19:42:57.7576785Z Entering 'external/hipify_torch' 2025-05-07T19:42:57.7604267Z Entering 'external/json' 2025-05-07T19:42:57.7641636Z [command]/usr/bin/git submodule foreach git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2025-05-07T19:42:57.7916193Z Entering 'external/asmjit' 2025-05-07T19:42:57.7938829Z Entering 'external/composable_kernel' 2025-05-07T19:42:57.7964662Z Entering 'external/cpuinfo' 2025-05-07T19:42:57.7984171Z Entering 'external/cutlass' 2025-05-07T19:42:57.8011298Z Entering 'external/googletest' 2025-05-07T19:42:57.8044236Z Entering 'external/hipify_torch' 2025-05-07T19:42:57.8066419Z Entering 'external/json' 2025-05-07T19:42:57.8102433Z ##[endgroup] 2025-05-07T19:42:57.8131634Z [command]/usr/bin/git log -1 --format=%H 2025-05-07T19:42:57.8153767Z a2f4c52051596e74bc8c16e3d2867a4ecdd271e0 2025-05-07T19:42:57.8287828Z ##[group]Run . $PRELUDE; print_system_info 2025-05-07T19:42:57.8288215Z . $PRELUDE; print_system_info 2025-05-07T19:42:57.8288803Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:42:57.8289140Z env: 2025-05-07T19:42:57.8289361Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:42:57.8289646Z BUILD_ENV: build_binary 2025-05-07T19:42:57.8289894Z BUILD_TARGET: genai 2025-05-07T19:42:57.8290104Z BUILD_VARIANT: cuda 2025-05-07T19:42:57.8290333Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:42:57.8290561Z ##[endgroup] 2025-05-07T19:42:58.2870485Z ################################################################################ 2025-05-07T19:42:58.2871526Z # Print System Info 2025-05-07T19:42:58.2872147Z # 2025-05-07T19:42:58.2887550Z # [2025-05-07T19:42:58.288Z] + print_system_info 2025-05-07T19:42:58.2888672Z ################################################################################ 2025-05-07T19:42:58.2889365Z 2025-05-07T19:42:58.2889801Z ################################################################################ 2025-05-07T19:42:58.2890796Z [INFO] Printing environment variables ... 2025-05-07T19:42:58.2891677Z + printenv 2025-05-07T19:42:58.2892037Z 2025-05-07T19:42:58.2896107Z GITHUB_WORKSPACE=/__w/FBGEMM/FBGEMM 2025-05-07T19:42:58.2896709Z BUILD_VARIANT=cuda 2025-05-07T19:42:58.2896952Z HOSTNAME=684dd9e27594 2025-05-07T19:42:58.2897531Z GITHUB_PATH=/__w/_temp/_runner_file_commands/add_path_7aa01c13-b7e4-4bae-9ddd-0e4cfa7d25d7 2025-05-07T19:42:58.2898147Z GITHUB_ACTION=__run_2 2025-05-07T19:42:58.2898463Z GITHUB_RUN_NUMBER=10601 2025-05-07T19:42:58.2898728Z RUNNER_NAME=i-0009551202268beab 2025-05-07T19:42:58.2899046Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-05-07T19:42:58.2899374Z PLATFORM_NAME_LC=linux-x86_64 2025-05-07T19:42:58.2899677Z MACHINE_NAME_LC=x86_64 2025-05-07T19:42:58.2899928Z GITHUB_TRIGGERING_ACTOR=q10 2025-05-07T19:42:58.2900248Z PRELUDE=.github/scripts/setup_env.bash 2025-05-07T19:42:58.2900563Z GITHUB_REF_TYPE=branch 2025-05-07T19:42:58.2901002Z *** 2025-05-07T19:42:58.2901248Z GITHUB_REPOSITORY_ID=150154628 2025-05-07T19:42:58.2901527Z GITHUB_ACTIONS=true 2025-05-07T19:42:58.2901836Z GITHUB_SHA=a2f4c52051596e74bc8c16e3d2867a4ecdd271e0 2025-05-07T19:42:58.2902430Z GITHUB_WORKFLOW_REF=pytorch/FBGEMM/.github/workflows/fbgemm_gpu_ci_cuda.yml@refs/pull/4066/merge 2025-05-07T19:42:58.2903015Z RUNNER_ENVIRONMENT=self-hosted 2025-05-07T19:42:58.2903304Z GITHUB_REF=refs/pull/4066/merge 2025-05-07T19:42:58.2903615Z RUNNER_OS=Linux 2025-05-07T19:42:58.2903883Z GITHUB_REF_PROTECTED=false 2025-05-07T19:42:58.2904146Z HOME=/github/home 2025-05-07T19:42:58.2904435Z GITHUB_API_URL=https://api.github.com 2025-05-07T19:42:58.2904855Z RUNNER_ARCH=X64 2025-05-07T19:42:58.2905102Z RUNNER_TEMP=/__w/_temp 2025-05-07T19:42:58.2905441Z BUILD_TARGET=genai 2025-05-07T19:42:58.2905867Z GITHUB_STATE=/__w/_temp/_runner_file_commands/save_state_7aa01c13-b7e4-4bae-9ddd-0e4cfa7d25d7 2025-05-07T19:42:58.2906506Z GITHUB_ENV=/__w/_temp/_runner_file_commands/set_env_7aa01c13-b7e4-4bae-9ddd-0e4cfa7d25d7 2025-05-07T19:42:58.2907011Z GITHUB_EVENT_PATH=/github/workflow/event.json 2025-05-07T19:42:58.2907353Z GITHUB_EVENT_NAME=pull_request 2025-05-07T19:42:58.2907615Z GITHUB_RUN_ID=14891846252 2025-05-07T19:42:58.2908306Z GITHUB_STEP_SUMMARY=/__w/_temp/_runner_file_commands/step_summary_7aa01c13-b7e4-4bae-9ddd-0e4cfa7d25d7 2025-05-07T19:42:58.2908810Z BUILD_ENV=build_binary 2025-05-07T19:42:58.2909061Z GITHUB_ACTOR=q10 2025-05-07T19:42:58.2909274Z GITHUB_RUN_ATTEMPT=1 2025-05-07T19:42:58.2909525Z KERN_NAME_LC=linux 2025-05-07T19:42:58.2909746Z BUILD_CUDA_VERSION=12.8.0 2025-05-07T19:42:58.2910065Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-05-07T19:42:58.2910403Z PLATFORM_NAME=Linux-x86_64 2025-05-07T19:42:58.2910696Z GITHUB_SERVER_URL=https://github.com 2025-05-07T19:42:58.2910991Z SHLVL=1 2025-05-07T19:42:58.2911190Z GITHUB_ACTOR_ID=255046 2025-05-07T19:42:58.2911501Z RUNNER_TOOL_CACHE=/__w/_tool 2025-05-07T19:42:58.2911971Z GITHUB_WORKFLOW_SHA=6060cd4b5f971680caecdcc657faccb5720d1c3e 2025-05-07T19:42:58.2912362Z GITHUB_REF_NAME=4066/merge 2025-05-07T19:42:58.2912599Z KERN_NAME=Linux 2025-05-07T19:42:58.2912842Z GITHUB_JOB=build_artifact 2025-05-07T19:42:58.2913105Z GITHUB_REPOSITORY=pytorch/FBGEMM 2025-05-07T19:42:58.2913405Z GITHUB_RETENTION_DAYS=90 2025-05-07T19:42:58.2913683Z RUNNER_WORKSPACE=/__w/FBGEMM 2025-05-07T19:42:58.2913945Z GITHUB_ACTION_REPOSITORY= 2025-05-07T19:42:58.2914306Z PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-05-07T19:42:58.2914850Z GITHUB_BASE_REF=main 2025-05-07T19:42:58.2915110Z CI=true 2025-05-07T19:42:58.2915332Z GITHUB_REPOSITORY_OWNER=pytorch 2025-05-07T19:42:58.2915813Z GITHUB_HEAD_REF=bm/genai-rocm-oss-6 2025-05-07T19:42:58.2916108Z GITHUB_ACTION_REF= 2025-05-07T19:42:58.2916388Z GITHUB_WORKFLOW=FBGEMM GPU/GenAI CUDA CI 2025-05-07T19:42:58.2916908Z GITHUB_OUTPUT=/__w/_temp/_runner_file_commands/set_output_7aa01c13-b7e4-4bae-9ddd-0e4cfa7d25d7 2025-05-07T19:42:58.2917436Z MACHINE_NAME=x86_64 2025-05-07T19:42:58.2917691Z _=/usr/bin/printenv 2025-05-07T19:42:58.2917832Z 2025-05-07T19:42:58.2917958Z ################################################################################ 2025-05-07T19:42:58.2918317Z [INFO] Print ldd version ... 2025-05-07T19:42:58.2918596Z + ldd --version 2025-05-07T19:42:58.2918756Z 2025-05-07T19:42:58.2918861Z ldd (GNU libc) 2.34 2025-05-07T19:42:58.2919147Z Copyright (C) 2021 Free Software Foundation, Inc. 2025-05-07T19:42:58.2919644Z This is free software; see the source for copying conditions. There is NO 2025-05-07T19:42:58.2920241Z warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 2025-05-07T19:42:58.2920728Z Written by Roland McGrath and Ulrich Drepper. 2025-05-07T19:42:58.2920969Z 2025-05-07T19:42:58.2921113Z ################################################################################ 2025-05-07T19:42:58.2921445Z [INFO] Print CPU info ... 2025-05-07T19:42:58.2921717Z + nproc 2025-05-07T19:42:58.2921832Z 2025-05-07T19:42:58.2926514Z 96 2025-05-07T19:42:58.2927418Z 2025-05-07T19:42:58.2927908Z + lscpu 2025-05-07T19:42:58.2928285Z 2025-05-07T19:42:58.3185802Z Architecture: x86_64 2025-05-07T19:42:58.3187011Z CPU op-mode(s): 32-bit, 64-bit 2025-05-07T19:42:58.3188298Z Address sizes: 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3189496Z Byte Order: Little Endian 2025-05-07T19:42:58.3190491Z CPU(s): 96 2025-05-07T19:42:58.3191364Z On-line CPU(s) list: 0-95 2025-05-07T19:42:58.3192354Z Vendor ID: GenuineIntel 2025-05-07T19:42:58.3193556Z Model name: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3194175Z CPU family: 6 2025-05-07T19:42:58.3194481Z Model: 85 2025-05-07T19:42:58.3194770Z Thread(s) per core: 2 2025-05-07T19:42:58.3195091Z Core(s) per socket: 24 2025-05-07T19:42:58.3195380Z Socket(s): 2 2025-05-07T19:42:58.3195688Z Stepping: 7 2025-05-07T19:42:58.3195987Z BogoMIPS: 6000.01 2025-05-07T19:42:58.3198520Z Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3201321Z Hypervisor vendor: KVM 2025-05-07T19:42:58.3201809Z Virtualization type: full 2025-05-07T19:42:58.3202189Z L1d cache: 1.5 MiB (48 instances) 2025-05-07T19:42:58.3202609Z L1i cache: 1.5 MiB (48 instances) 2025-05-07T19:42:58.3202997Z L2 cache: 48 MiB (48 instances) 2025-05-07T19:42:58.3203380Z L3 cache: 71.5 MiB (2 instances) 2025-05-07T19:42:58.3203739Z NUMA node(s): 2 2025-05-07T19:42:58.3204047Z NUMA node0 CPU(s): 0-23,48-71 2025-05-07T19:42:58.3204438Z NUMA node1 CPU(s): 24-47,72-95 2025-05-07T19:42:58.3204947Z Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status 2025-05-07T19:42:58.3205576Z Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported 2025-05-07T19:42:58.3206106Z Vulnerability L1tf: Mitigation; PTE Inversion 2025-05-07T19:42:58.3206780Z Vulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown 2025-05-07T19:42:58.3207422Z Vulnerability Meltdown: Mitigation; PTI 2025-05-07T19:42:58.3208074Z Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown 2025-05-07T19:42:58.3208756Z Vulnerability Reg file data sampling: Not affected 2025-05-07T19:42:58.3209160Z Vulnerability Retbleed: Vulnerable 2025-05-07T19:42:58.3209575Z Vulnerability Spec rstack overflow: Not affected 2025-05-07T19:42:58.3209980Z Vulnerability Spec store bypass: Vulnerable 2025-05-07T19:42:58.3210566Z Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization 2025-05-07T19:42:58.3211474Z Vulnerability Spectre v2: Mitigation; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Retpoline 2025-05-07T19:42:58.3212140Z Vulnerability Srbds: Not affected 2025-05-07T19:42:58.3212543Z Vulnerability Tsx async abort: Not affected 2025-05-07T19:42:58.3212795Z 2025-05-07T19:42:58.3212886Z + cat /proc/cpuinfo 2025-05-07T19:42:58.3213039Z 2025-05-07T19:42:58.3213422Z processor : 0 2025-05-07T19:42:58.3213643Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3213928Z cpu family : 6 2025-05-07T19:42:58.3214149Z model : 85 2025-05-07T19:42:58.3214467Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3214835Z stepping : 7 2025-05-07T19:42:58.3215082Z microcode : 0x5003901 2025-05-07T19:42:58.3215344Z cpu MHz : 3271.286 2025-05-07T19:42:58.3215575Z cache size : 36608 KB 2025-05-07T19:42:58.3215836Z physical id : 0 2025-05-07T19:42:58.3216060Z siblings : 48 2025-05-07T19:42:58.3216296Z core id : 0 2025-05-07T19:42:58.3216520Z cpu cores : 24 2025-05-07T19:42:58.3216762Z apicid : 0 2025-05-07T19:42:58.3216960Z initial apicid : 0 2025-05-07T19:42:58.3217316Z fpu : yes 2025-05-07T19:42:58.3217516Z fpu_exception : yes 2025-05-07T19:42:58.3217750Z cpuid level : 13 2025-05-07T19:42:58.3218049Z wp : yes 2025-05-07T19:42:58.3220418Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3223287Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3223911Z bogomips : 6000.01 2025-05-07T19:42:58.3224125Z clflush size : 64 2025-05-07T19:42:58.3224365Z cache_alignment : 64 2025-05-07T19:42:58.3224656Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3225092Z power management: 2025-05-07T19:42:58.3225235Z 2025-05-07T19:42:58.3225327Z processor : 1 2025-05-07T19:42:58.3225584Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3225849Z cpu family : 6 2025-05-07T19:42:58.3226107Z model : 85 2025-05-07T19:42:58.3226403Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3226819Z stepping : 7 2025-05-07T19:42:58.3227071Z microcode : 0x5003901 2025-05-07T19:42:58.3227314Z cpu MHz : 3251.001 2025-05-07T19:42:58.3227574Z cache size : 36608 KB 2025-05-07T19:42:58.3227813Z physical id : 0 2025-05-07T19:42:58.3228072Z siblings : 48 2025-05-07T19:42:58.3228286Z core id : 1 2025-05-07T19:42:58.3228547Z cpu cores : 24 2025-05-07T19:42:58.3228766Z apicid : 2 2025-05-07T19:42:58.3229011Z initial apicid : 2 2025-05-07T19:42:58.3229251Z fpu : yes 2025-05-07T19:42:58.3229487Z fpu_exception : yes 2025-05-07T19:42:58.3229730Z cpuid level : 13 2025-05-07T19:42:58.3229977Z wp : yes 2025-05-07T19:42:58.3232520Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3235271Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3235877Z bogomips : 6000.01 2025-05-07T19:42:58.3236130Z clflush size : 64 2025-05-07T19:42:58.3236358Z cache_alignment : 64 2025-05-07T19:42:58.3236665Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3237009Z power management: 2025-05-07T19:42:58.3237173Z 2025-05-07T19:42:58.3237270Z processor : 2 2025-05-07T19:42:58.3237499Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3237780Z cpu family : 6 2025-05-07T19:42:58.3237999Z model : 85 2025-05-07T19:42:58.3238301Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3238714Z stepping : 7 2025-05-07T19:42:58.3238942Z microcode : 0x5003901 2025-05-07T19:42:58.3239211Z cpu MHz : 3261.969 2025-05-07T19:42:58.3239446Z cache size : 36608 KB 2025-05-07T19:42:58.3239722Z physical id : 0 2025-05-07T19:42:58.3239949Z siblings : 48 2025-05-07T19:42:58.3240194Z core id : 2 2025-05-07T19:42:58.3240439Z cpu cores : 24 2025-05-07T19:42:58.3240664Z apicid : 4 2025-05-07T19:42:58.3240908Z initial apicid : 4 2025-05-07T19:42:58.3241144Z fpu : yes 2025-05-07T19:42:58.3241391Z fpu_exception : yes 2025-05-07T19:42:58.3241630Z cpuid level : 13 2025-05-07T19:42:58.3241882Z wp : yes 2025-05-07T19:42:58.3244245Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3247144Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3247764Z bogomips : 6000.01 2025-05-07T19:42:58.3247994Z clflush size : 64 2025-05-07T19:42:58.3248257Z cache_alignment : 64 2025-05-07T19:42:58.3248544Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3248913Z power management: 2025-05-07T19:42:58.3249052Z 2025-05-07T19:42:58.3249166Z processor : 3 2025-05-07T19:42:58.3249440Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3249712Z cpu family : 6 2025-05-07T19:42:58.3249927Z model : 85 2025-05-07T19:42:58.3250235Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3250596Z stepping : 7 2025-05-07T19:42:58.3250835Z microcode : 0x5003901 2025-05-07T19:42:58.3251073Z cpu MHz : 3275.574 2025-05-07T19:42:58.3251319Z cache size : 36608 KB 2025-05-07T19:42:58.3251549Z physical id : 0 2025-05-07T19:42:58.3251784Z siblings : 48 2025-05-07T19:42:58.3252012Z core id : 3 2025-05-07T19:42:58.3252216Z cpu cores : 24 2025-05-07T19:42:58.3252446Z apicid : 6 2025-05-07T19:42:58.3252654Z initial apicid : 6 2025-05-07T19:42:58.3252898Z fpu : yes 2025-05-07T19:42:58.3253105Z fpu_exception : yes 2025-05-07T19:42:58.3253358Z cpuid level : 13 2025-05-07T19:42:58.3253578Z wp : yes 2025-05-07T19:42:58.3255898Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3258833Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3259423Z bogomips : 6000.01 2025-05-07T19:42:58.3259650Z clflush size : 64 2025-05-07T19:42:58.3259863Z cache_alignment : 64 2025-05-07T19:42:58.3260149Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3260489Z power management: 2025-05-07T19:42:58.3260624Z 2025-05-07T19:42:58.3260707Z processor : 4 2025-05-07T19:42:58.3260945Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3261205Z cpu family : 6 2025-05-07T19:42:58.3261445Z model : 85 2025-05-07T19:42:58.3261736Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3262128Z stepping : 7 2025-05-07T19:42:58.3262354Z microcode : 0x5003901 2025-05-07T19:42:58.3262616Z cpu MHz : 3406.628 2025-05-07T19:42:58.3262845Z cache size : 36608 KB 2025-05-07T19:42:58.3263110Z physical id : 0 2025-05-07T19:42:58.3263325Z siblings : 48 2025-05-07T19:42:58.3263561Z core id : 4 2025-05-07T19:42:58.3263791Z cpu cores : 24 2025-05-07T19:42:58.3264010Z apicid : 8 2025-05-07T19:42:58.3264241Z initial apicid : 8 2025-05-07T19:42:58.3264466Z fpu : yes 2025-05-07T19:42:58.3264701Z fpu_exception : yes 2025-05-07T19:42:58.3264937Z cpuid level : 13 2025-05-07T19:42:58.3265178Z wp : yes 2025-05-07T19:42:58.3267718Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3270578Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3271206Z bogomips : 6000.01 2025-05-07T19:42:58.3271431Z clflush size : 64 2025-05-07T19:42:58.3271672Z cache_alignment : 64 2025-05-07T19:42:58.3271950Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3272319Z power management: 2025-05-07T19:42:58.3272461Z 2025-05-07T19:42:58.3272574Z processor : 5 2025-05-07T19:42:58.3272804Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3273078Z cpu family : 6 2025-05-07T19:42:58.3273384Z model : 85 2025-05-07T19:42:58.3273701Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3274067Z stepping : 7 2025-05-07T19:42:58.3274311Z microcode : 0x5003901 2025-05-07T19:42:58.3274558Z cpu MHz : 3237.441 2025-05-07T19:42:58.3274813Z cache size : 36608 KB 2025-05-07T19:42:58.3275051Z physical id : 0 2025-05-07T19:42:58.3275296Z siblings : 48 2025-05-07T19:42:58.3275534Z core id : 5 2025-05-07T19:42:58.3275744Z cpu cores : 24 2025-05-07T19:42:58.3275983Z apicid : 10 2025-05-07T19:42:58.3276200Z initial apicid : 10 2025-05-07T19:42:58.3276449Z fpu : yes 2025-05-07T19:42:58.3276665Z fpu_exception : yes 2025-05-07T19:42:58.3276919Z cpuid level : 13 2025-05-07T19:42:58.3277143Z wp : yes 2025-05-07T19:42:58.3279626Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3282307Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3282895Z bogomips : 6000.01 2025-05-07T19:42:58.3283145Z clflush size : 64 2025-05-07T19:42:58.3283370Z cache_alignment : 64 2025-05-07T19:42:58.3283674Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3284026Z power management: 2025-05-07T19:42:58.3284165Z 2025-05-07T19:42:58.3284256Z processor : 6 2025-05-07T19:42:58.3284506Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3284760Z cpu family : 6 2025-05-07T19:42:58.3285007Z model : 85 2025-05-07T19:42:58.3285299Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3285693Z stepping : 7 2025-05-07T19:42:58.3285920Z microcode : 0x5003901 2025-05-07T19:42:58.3286190Z cpu MHz : 3255.471 2025-05-07T19:42:58.3286432Z cache size : 36608 KB 2025-05-07T19:42:58.3286701Z physical id : 0 2025-05-07T19:42:58.3286924Z siblings : 48 2025-05-07T19:42:58.3308111Z core id : 6 2025-05-07T19:42:58.3308403Z cpu cores : 24 2025-05-07T19:42:58.3308661Z apicid : 12 2025-05-07T19:42:58.3308869Z initial apicid : 12 2025-05-07T19:42:58.3309105Z fpu : yes 2025-05-07T19:42:58.3309307Z fpu_exception : yes 2025-05-07T19:42:58.3309540Z cpuid level : 13 2025-05-07T19:42:58.3309858Z wp : yes 2025-05-07T19:42:58.3312153Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3314944Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3315522Z bogomips : 6000.01 2025-05-07T19:42:58.3315752Z clflush size : 64 2025-05-07T19:42:58.3315965Z cache_alignment : 64 2025-05-07T19:42:58.3316252Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3316590Z power management: 2025-05-07T19:42:58.3316723Z 2025-05-07T19:42:58.3316805Z processor : 7 2025-05-07T19:42:58.3317029Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3317264Z cpu family : 6 2025-05-07T19:42:58.3317482Z model : 85 2025-05-07T19:42:58.3317753Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3318183Z stepping : 7 2025-05-07T19:42:58.3318386Z microcode : 0x5003901 2025-05-07T19:42:58.3318629Z cpu MHz : 3249.439 2025-05-07T19:42:58.3318841Z cache size : 36608 KB 2025-05-07T19:42:58.3319076Z physical id : 0 2025-05-07T19:42:58.3319299Z siblings : 48 2025-05-07T19:42:58.3319493Z core id : 7 2025-05-07T19:42:58.3319696Z cpu cores : 24 2025-05-07T19:42:58.3319891Z apicid : 14 2025-05-07T19:42:58.3320102Z initial apicid : 14 2025-05-07T19:42:58.3320310Z fpu : yes 2025-05-07T19:42:58.3320623Z fpu_exception : yes 2025-05-07T19:42:58.3320995Z cpuid level : 13 2025-05-07T19:42:58.3321242Z wp : yes 2025-05-07T19:42:58.3323558Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3326224Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3326813Z bogomips : 6000.01 2025-05-07T19:42:58.3327025Z clflush size : 64 2025-05-07T19:42:58.3327250Z cache_alignment : 64 2025-05-07T19:42:58.3327530Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3327852Z power management: 2025-05-07T19:42:58.3327986Z 2025-05-07T19:42:58.3328086Z processor : 8 2025-05-07T19:42:58.3328296Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3328547Z cpu family : 6 2025-05-07T19:42:58.3328745Z model : 85 2025-05-07T19:42:58.3329028Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3329373Z stepping : 7 2025-05-07T19:42:58.3329591Z microcode : 0x5003901 2025-05-07T19:42:58.3329808Z cpu MHz : 3000.006 2025-05-07T19:42:58.3330040Z cache size : 36608 KB 2025-05-07T19:42:58.3330256Z physical id : 0 2025-05-07T19:42:58.3330476Z siblings : 48 2025-05-07T19:42:58.3330686Z core id : 8 2025-05-07T19:42:58.3330878Z cpu cores : 24 2025-05-07T19:42:58.3331301Z apicid : 16 2025-05-07T19:42:58.3331500Z initial apicid : 16 2025-05-07T19:42:58.3331727Z fpu : yes 2025-05-07T19:42:58.3331927Z fpu_exception : yes 2025-05-07T19:42:58.3332158Z cpuid level : 13 2025-05-07T19:42:58.3332364Z wp : yes 2025-05-07T19:42:58.3334696Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3337527Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3338211Z bogomips : 6000.01 2025-05-07T19:42:58.3338455Z clflush size : 64 2025-05-07T19:42:58.3338678Z cache_alignment : 64 2025-05-07T19:42:58.3338981Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3339344Z power management: 2025-05-07T19:42:58.3339485Z 2025-05-07T19:42:58.3339578Z processor : 9 2025-05-07T19:42:58.3339818Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3340056Z cpu family : 6 2025-05-07T19:42:58.3340275Z model : 85 2025-05-07T19:42:58.3340553Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3340924Z stepping : 7 2025-05-07T19:42:58.3341131Z microcode : 0x5003901 2025-05-07T19:42:58.3341425Z cpu MHz : 3000.006 2025-05-07T19:42:58.3341640Z cache size : 36608 KB 2025-05-07T19:42:58.3341876Z physical id : 0 2025-05-07T19:42:58.3342095Z siblings : 48 2025-05-07T19:42:58.3342289Z core id : 9 2025-05-07T19:42:58.3342501Z cpu cores : 24 2025-05-07T19:42:58.3342701Z apicid : 18 2025-05-07T19:42:58.3342913Z initial apicid : 18 2025-05-07T19:42:58.3343125Z fpu : yes 2025-05-07T19:42:58.3343335Z fpu_exception : yes 2025-05-07T19:42:58.3343547Z cpuid level : 13 2025-05-07T19:42:58.3343764Z wp : yes 2025-05-07T19:42:58.3346075Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3348797Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3349395Z bogomips : 6000.01 2025-05-07T19:42:58.3349726Z clflush size : 64 2025-05-07T19:42:58.3349951Z cache_alignment : 64 2025-05-07T19:42:58.3350232Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3350547Z power management: 2025-05-07T19:42:58.3350677Z 2025-05-07T19:42:58.3350773Z processor : 10 2025-05-07T19:42:58.3350983Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3351231Z cpu family : 6 2025-05-07T19:42:58.3351426Z model : 85 2025-05-07T19:42:58.3351701Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3352040Z stepping : 7 2025-05-07T19:42:58.3352253Z microcode : 0x5003901 2025-05-07T19:42:58.3352470Z cpu MHz : 3246.226 2025-05-07T19:42:58.3352698Z cache size : 36608 KB 2025-05-07T19:42:58.3352917Z physical id : 0 2025-05-07T19:42:58.3353133Z siblings : 48 2025-05-07T19:42:58.3353342Z core id : 10 2025-05-07T19:42:58.3353530Z cpu cores : 24 2025-05-07T19:42:58.3353745Z apicid : 20 2025-05-07T19:42:58.3353944Z initial apicid : 20 2025-05-07T19:42:58.3354164Z fpu : yes 2025-05-07T19:42:58.3354354Z fpu_exception : yes 2025-05-07T19:42:58.3354578Z cpuid level : 13 2025-05-07T19:42:58.3354776Z wp : yes 2025-05-07T19:42:58.3357057Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3359692Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3360327Z bogomips : 6000.01 2025-05-07T19:42:58.3360551Z clflush size : 64 2025-05-07T19:42:58.3360764Z cache_alignment : 64 2025-05-07T19:42:58.3361046Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3361378Z power management: 2025-05-07T19:42:58.3361509Z 2025-05-07T19:42:58.3361593Z processor : 11 2025-05-07T19:42:58.3361823Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3362057Z cpu family : 6 2025-05-07T19:42:58.3362266Z model : 85 2025-05-07T19:42:58.3362529Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3362884Z stepping : 7 2025-05-07T19:42:58.3363081Z microcode : 0x5003901 2025-05-07T19:42:58.3363310Z cpu MHz : 3000.006 2025-05-07T19:42:58.3363516Z cache size : 36608 KB 2025-05-07T19:42:58.3363801Z physical id : 0 2025-05-07T19:42:58.3364016Z siblings : 48 2025-05-07T19:42:58.3364203Z core id : 11 2025-05-07T19:42:58.3364411Z cpu cores : 24 2025-05-07T19:42:58.3364606Z apicid : 22 2025-05-07T19:42:58.3364818Z initial apicid : 22 2025-05-07T19:42:58.3365015Z fpu : yes 2025-05-07T19:42:58.3365220Z fpu_exception : yes 2025-05-07T19:42:58.3365429Z cpuid level : 13 2025-05-07T19:42:58.3365640Z wp : yes 2025-05-07T19:42:58.3368275Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3371002Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3371611Z bogomips : 6000.01 2025-05-07T19:42:58.3371836Z clflush size : 64 2025-05-07T19:42:58.3372066Z cache_alignment : 64 2025-05-07T19:42:58.3372355Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3372682Z power management: 2025-05-07T19:42:58.3372816Z 2025-05-07T19:42:58.3372914Z processor : 12 2025-05-07T19:42:58.3373128Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3373378Z cpu family : 6 2025-05-07T19:42:58.3373568Z model : 85 2025-05-07T19:42:58.3373851Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3374197Z stepping : 7 2025-05-07T19:42:58.3374413Z microcode : 0x5003901 2025-05-07T19:42:58.3374635Z cpu MHz : 3000.006 2025-05-07T19:42:58.3374860Z cache size : 36608 KB 2025-05-07T19:42:58.3375100Z physical id : 0 2025-05-07T19:42:58.3375312Z siblings : 48 2025-05-07T19:42:58.3375522Z core id : 12 2025-05-07T19:42:58.3375718Z cpu cores : 24 2025-05-07T19:42:58.3375932Z apicid : 24 2025-05-07T19:42:58.3376136Z initial apicid : 24 2025-05-07T19:42:58.3376355Z fpu : yes 2025-05-07T19:42:58.3376554Z fpu_exception : yes 2025-05-07T19:42:58.3376789Z cpuid level : 13 2025-05-07T19:42:58.3377073Z wp : yes 2025-05-07T19:42:58.3379418Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3382130Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3382722Z bogomips : 6000.01 2025-05-07T19:42:58.3382950Z clflush size : 64 2025-05-07T19:42:58.3385794Z cache_alignment : 64 2025-05-07T19:42:58.3386084Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3386410Z power management: 2025-05-07T19:42:58.3386558Z 2025-05-07T19:42:58.3386645Z processor : 13 2025-05-07T19:42:58.3386861Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3387107Z cpu family : 6 2025-05-07T19:42:58.3387312Z model : 85 2025-05-07T19:42:58.3387582Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3387945Z stepping : 7 2025-05-07T19:42:58.3388146Z microcode : 0x5003901 2025-05-07T19:42:58.3388379Z cpu MHz : 3000.006 2025-05-07T19:42:58.3388594Z cache size : 36608 KB 2025-05-07T19:42:58.3388827Z physical id : 0 2025-05-07T19:42:58.3389033Z siblings : 48 2025-05-07T19:42:58.3389240Z core id : 13 2025-05-07T19:42:58.3389524Z cpu cores : 24 2025-05-07T19:42:58.3389746Z apicid : 26 2025-05-07T19:42:58.3389946Z initial apicid : 26 2025-05-07T19:42:58.3390176Z fpu : yes 2025-05-07T19:42:58.3390374Z fpu_exception : yes 2025-05-07T19:42:58.3390613Z cpuid level : 13 2025-05-07T19:42:58.3390835Z wp : yes 2025-05-07T19:42:58.3393174Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3395913Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3396517Z bogomips : 6000.01 2025-05-07T19:42:58.3396731Z clflush size : 64 2025-05-07T19:42:58.3396960Z cache_alignment : 64 2025-05-07T19:42:58.3397235Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3397569Z power management: 2025-05-07T19:42:58.3397702Z 2025-05-07T19:42:58.3397787Z processor : 14 2025-05-07T19:42:58.3398012Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3398251Z cpu family : 6 2025-05-07T19:42:58.3398466Z model : 85 2025-05-07T19:42:58.3398753Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3399104Z stepping : 7 2025-05-07T19:42:58.3399323Z microcode : 0x5003901 2025-05-07T19:42:58.3399545Z cpu MHz : 3264.428 2025-05-07T19:42:58.3399772Z cache size : 36608 KB 2025-05-07T19:42:58.3399998Z physical id : 0 2025-05-07T19:42:58.3400214Z siblings : 48 2025-05-07T19:42:58.3400411Z core id : 14 2025-05-07T19:42:58.3400621Z cpu cores : 24 2025-05-07T19:42:58.3400826Z apicid : 28 2025-05-07T19:42:58.3401038Z initial apicid : 28 2025-05-07T19:42:58.3401248Z fpu : yes 2025-05-07T19:42:58.3401450Z fpu_exception : yes 2025-05-07T19:42:58.3401677Z cpuid level : 13 2025-05-07T19:42:58.3401885Z wp : yes 2025-05-07T19:42:58.3404224Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3406943Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3407535Z bogomips : 6000.01 2025-05-07T19:42:58.3407765Z clflush size : 64 2025-05-07T19:42:58.3407981Z cache_alignment : 64 2025-05-07T19:42:58.3408269Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3408658Z power management: 2025-05-07T19:42:58.3408861Z 2025-05-07T19:42:58.3408949Z processor : 15 2025-05-07T19:42:58.3409167Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3409415Z cpu family : 6 2025-05-07T19:42:58.3409625Z model : 85 2025-05-07T19:42:58.3409895Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3410253Z stepping : 7 2025-05-07T19:42:58.3410454Z microcode : 0x5003901 2025-05-07T19:42:58.3410685Z cpu MHz : 3177.566 2025-05-07T19:42:58.3410895Z cache size : 36608 KB 2025-05-07T19:42:58.3411126Z physical id : 0 2025-05-07T19:42:58.3411329Z siblings : 48 2025-05-07T19:42:58.3411538Z core id : 15 2025-05-07T19:42:58.3411732Z cpu cores : 24 2025-05-07T19:42:58.3411944Z apicid : 30 2025-05-07T19:42:58.3412202Z initial apicid : 30 2025-05-07T19:42:58.3412428Z fpu : yes 2025-05-07T19:42:58.3412637Z fpu_exception : yes 2025-05-07T19:42:58.3412849Z cpuid level : 13 2025-05-07T19:42:58.3413068Z wp : yes 2025-05-07T19:42:58.3415405Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3418205Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3418814Z bogomips : 6000.01 2025-05-07T19:42:58.3419038Z clflush size : 64 2025-05-07T19:42:58.3419538Z cache_alignment : 64 2025-05-07T19:42:58.3419952Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3420355Z power management: 2025-05-07T19:42:58.3420494Z 2025-05-07T19:42:58.3420594Z processor : 16 2025-05-07T19:42:58.3420811Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3421061Z cpu family : 6 2025-05-07T19:42:58.3421259Z model : 85 2025-05-07T19:42:58.3421541Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3421894Z stepping : 7 2025-05-07T19:42:58.3422108Z microcode : 0x5003901 2025-05-07T19:42:58.3422331Z cpu MHz : 3000.006 2025-05-07T19:42:58.3422557Z cache size : 36608 KB 2025-05-07T19:42:58.3422792Z physical id : 0 2025-05-07T19:42:58.3422996Z siblings : 48 2025-05-07T19:42:58.3423205Z core id : 16 2025-05-07T19:42:58.3423400Z cpu cores : 24 2025-05-07T19:42:58.3423613Z apicid : 32 2025-05-07T19:42:58.3423811Z initial apicid : 32 2025-05-07T19:42:58.3424037Z fpu : yes 2025-05-07T19:42:58.3424229Z fpu_exception : yes 2025-05-07T19:42:58.3424454Z cpuid level : 13 2025-05-07T19:42:58.3424653Z wp : yes 2025-05-07T19:42:58.3426982Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3429695Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3430284Z bogomips : 6000.01 2025-05-07T19:42:58.3430511Z clflush size : 64 2025-05-07T19:42:58.3430734Z cache_alignment : 64 2025-05-07T19:42:58.3431000Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3431337Z power management: 2025-05-07T19:42:58.3431543Z 2025-05-07T19:42:58.3431627Z processor : 17 2025-05-07T19:42:58.3431848Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3432079Z cpu family : 6 2025-05-07T19:42:58.3432289Z model : 85 2025-05-07T19:42:58.3432665Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3433012Z stepping : 7 2025-05-07T19:42:58.3433209Z microcode : 0x5003901 2025-05-07T19:42:58.3433439Z cpu MHz : 3203.058 2025-05-07T19:42:58.3433649Z cache size : 36608 KB 2025-05-07T19:42:58.3433878Z physical id : 0 2025-05-07T19:42:58.3434096Z siblings : 48 2025-05-07T19:42:58.3434289Z core id : 17 2025-05-07T19:42:58.3434496Z cpu cores : 24 2025-05-07T19:42:58.3434690Z apicid : 34 2025-05-07T19:42:58.3435043Z initial apicid : 34 2025-05-07T19:42:58.3435247Z fpu : yes 2025-05-07T19:42:58.3435501Z fpu_exception : yes 2025-05-07T19:42:58.3435710Z cpuid level : 13 2025-05-07T19:42:58.3435919Z wp : yes 2025-05-07T19:42:58.3438198Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3440823Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3441414Z bogomips : 6000.01 2025-05-07T19:42:58.3441625Z clflush size : 64 2025-05-07T19:42:58.3441851Z cache_alignment : 64 2025-05-07T19:42:58.3442127Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3442441Z power management: 2025-05-07T19:42:58.3442571Z 2025-05-07T19:42:58.3442667Z processor : 18 2025-05-07T19:42:58.3442880Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3443123Z cpu family : 6 2025-05-07T19:42:58.3443321Z model : 85 2025-05-07T19:42:58.3443595Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3443935Z stepping : 7 2025-05-07T19:42:58.3444143Z microcode : 0x5003901 2025-05-07T19:42:58.3444373Z cpu MHz : 3000.006 2025-05-07T19:42:58.3444580Z cache size : 36608 KB 2025-05-07T19:42:58.3444805Z physical id : 0 2025-05-07T19:42:58.3445002Z siblings : 48 2025-05-07T19:42:58.3445201Z core id : 18 2025-05-07T19:42:58.3445391Z cpu cores : 24 2025-05-07T19:42:58.3445594Z apicid : 36 2025-05-07T19:42:58.3445788Z initial apicid : 36 2025-05-07T19:42:58.3445997Z fpu : yes 2025-05-07T19:42:58.3446179Z fpu_exception : yes 2025-05-07T19:42:58.3446396Z cpuid level : 13 2025-05-07T19:42:58.3446582Z wp : yes 2025-05-07T19:42:58.3448846Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3451482Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3452057Z bogomips : 6000.01 2025-05-07T19:42:58.3452253Z clflush size : 64 2025-05-07T19:42:58.3452470Z cache_alignment : 64 2025-05-07T19:42:58.3452729Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3453040Z power management: 2025-05-07T19:42:58.3453160Z 2025-05-07T19:42:58.3453235Z processor : 19 2025-05-07T19:42:58.3453445Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3453726Z cpu family : 6 2025-05-07T19:42:58.3453937Z model : 85 2025-05-07T19:42:58.3454212Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3454563Z stepping : 7 2025-05-07T19:42:58.3454766Z microcode : 0x5003901 2025-05-07T19:42:58.3454979Z cpu MHz : 3275.652 2025-05-07T19:42:58.3455188Z cache size : 36608 KB 2025-05-07T19:42:58.3455396Z physical id : 0 2025-05-07T19:42:58.3455593Z siblings : 48 2025-05-07T19:42:58.3455776Z core id : 19 2025-05-07T19:42:58.3455966Z cpu cores : 24 2025-05-07T19:42:58.3456151Z apicid : 38 2025-05-07T19:42:58.3456349Z initial apicid : 38 2025-05-07T19:42:58.3456542Z fpu : yes 2025-05-07T19:42:58.3456732Z fpu_exception : yes 2025-05-07T19:42:58.3456962Z cpuid level : 13 2025-05-07T19:42:58.3457237Z wp : yes 2025-05-07T19:42:58.3459827Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3462533Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3463122Z bogomips : 6000.01 2025-05-07T19:42:58.3463351Z clflush size : 64 2025-05-07T19:42:58.3463565Z cache_alignment : 64 2025-05-07T19:42:58.3463866Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3464211Z power management: 2025-05-07T19:42:58.3464351Z 2025-05-07T19:42:58.3464440Z processor : 20 2025-05-07T19:42:58.3464673Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3464916Z cpu family : 6 2025-05-07T19:42:58.3465136Z model : 85 2025-05-07T19:42:58.3465411Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3465787Z stepping : 7 2025-05-07T19:42:58.3465982Z microcode : 0x5003901 2025-05-07T19:42:58.3466222Z cpu MHz : 3280.999 2025-05-07T19:42:58.3466431Z cache size : 36608 KB 2025-05-07T19:42:58.3466666Z physical id : 0 2025-05-07T19:42:58.3466873Z siblings : 48 2025-05-07T19:42:58.3467079Z core id : 20 2025-05-07T19:42:58.3467288Z cpu cores : 24 2025-05-07T19:42:58.3467624Z apicid : 40 2025-05-07T19:42:58.3467845Z initial apicid : 40 2025-05-07T19:42:58.3468056Z fpu : yes 2025-05-07T19:42:58.3468269Z fpu_exception : yes 2025-05-07T19:42:58.3468473Z cpuid level : 13 2025-05-07T19:42:58.3468692Z wp : yes 2025-05-07T19:42:58.3471020Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3473758Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3474365Z bogomips : 6000.01 2025-05-07T19:42:58.3474574Z clflush size : 64 2025-05-07T19:42:58.3474808Z cache_alignment : 64 2025-05-07T19:42:58.3475070Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3475408Z power management: 2025-05-07T19:42:58.3475541Z 2025-05-07T19:42:58.3475638Z processor : 21 2025-05-07T19:42:58.3475853Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3476097Z cpu family : 6 2025-05-07T19:42:58.3476297Z model : 85 2025-05-07T19:42:58.3476706Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3477058Z stepping : 7 2025-05-07T19:42:58.3477266Z microcode : 0x5003901 2025-05-07T19:42:58.3477482Z cpu MHz : 3264.269 2025-05-07T19:42:58.3477689Z cache size : 36608 KB 2025-05-07T19:42:58.3477898Z physical id : 0 2025-05-07T19:42:58.3478102Z siblings : 48 2025-05-07T19:42:58.3478289Z core id : 21 2025-05-07T19:42:58.3478482Z cpu cores : 24 2025-05-07T19:42:58.3478684Z apicid : 42 2025-05-07T19:42:58.3478880Z initial apicid : 42 2025-05-07T19:42:58.3479114Z fpu : yes 2025-05-07T19:42:58.3479419Z fpu_exception : yes 2025-05-07T19:42:58.3479642Z cpuid level : 13 2025-05-07T19:42:58.3479836Z wp : yes 2025-05-07T19:42:58.3482079Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3484593Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3485139Z bogomips : 6000.01 2025-05-07T19:42:58.3485350Z clflush size : 64 2025-05-07T19:42:58.3485552Z cache_alignment : 64 2025-05-07T19:42:58.3485835Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3486148Z power management: 2025-05-07T19:42:58.3486274Z 2025-05-07T19:42:58.3486358Z processor : 22 2025-05-07T19:42:58.3486566Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3486795Z cpu family : 6 2025-05-07T19:42:58.3486994Z model : 85 2025-05-07T19:42:58.3487249Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3487593Z stepping : 7 2025-05-07T19:42:58.3487776Z microcode : 0x5003901 2025-05-07T19:42:58.3487992Z cpu MHz : 3212.970 2025-05-07T19:42:58.3488183Z cache size : 36608 KB 2025-05-07T19:42:58.3488390Z physical id : 0 2025-05-07T19:42:58.3488572Z siblings : 48 2025-05-07T19:42:58.3488773Z core id : 22 2025-05-07T19:42:58.3488980Z cpu cores : 24 2025-05-07T19:42:58.3489161Z apicid : 44 2025-05-07T19:42:58.3489371Z initial apicid : 44 2025-05-07T19:42:58.3489562Z fpu : yes 2025-05-07T19:42:58.3489764Z fpu_exception : yes 2025-05-07T19:42:58.3489962Z cpuid level : 13 2025-05-07T19:42:58.3490164Z wp : yes 2025-05-07T19:42:58.3492318Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3494815Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3495371Z bogomips : 6000.01 2025-05-07T19:42:58.3495568Z clflush size : 64 2025-05-07T19:42:58.3495791Z cache_alignment : 64 2025-05-07T19:42:58.3496037Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3496356Z power management: 2025-05-07T19:42:58.3496477Z 2025-05-07T19:42:58.3496577Z processor : 23 2025-05-07T19:42:58.3496783Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3497113Z cpu family : 6 2025-05-07T19:42:58.3497313Z model : 85 2025-05-07T19:42:58.3497759Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3498109Z stepping : 7 2025-05-07T19:42:58.3498426Z microcode : 0x5003901 2025-05-07T19:42:58.3498648Z cpu MHz : 3240.105 2025-05-07T19:42:58.3498879Z cache size : 36608 KB 2025-05-07T19:42:58.3499103Z physical id : 0 2025-05-07T19:42:58.3499324Z siblings : 48 2025-05-07T19:42:58.3499550Z core id : 23 2025-05-07T19:42:58.3499751Z cpu cores : 24 2025-05-07T19:42:58.3499975Z apicid : 46 2025-05-07T19:42:58.3500176Z initial apicid : 46 2025-05-07T19:42:58.3500414Z fpu : yes 2025-05-07T19:42:58.3500614Z fpu_exception : yes 2025-05-07T19:42:58.3500844Z cpuid level : 13 2025-05-07T19:42:58.3501049Z wp : yes 2025-05-07T19:42:58.3503438Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3506174Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3506766Z bogomips : 6000.01 2025-05-07T19:42:58.3507004Z clflush size : 64 2025-05-07T19:42:58.3507218Z cache_alignment : 64 2025-05-07T19:42:58.3507537Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3507913Z power management: 2025-05-07T19:42:58.3508058Z 2025-05-07T19:42:58.3508155Z processor : 24 2025-05-07T19:42:58.3508421Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3508680Z cpu family : 6 2025-05-07T19:42:58.3508934Z model : 85 2025-05-07T19:42:58.3509234Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3509739Z stepping : 7 2025-05-07T19:42:58.3509949Z microcode : 0x5003901 2025-05-07T19:42:58.3510203Z cpu MHz : 1200.013 2025-05-07T19:42:58.3510430Z cache size : 36608 KB 2025-05-07T19:42:58.3510674Z physical id : 1 2025-05-07T19:42:58.3510888Z siblings : 48 2025-05-07T19:42:58.3511119Z core id : 0 2025-05-07T19:42:58.3511347Z cpu cores : 24 2025-05-07T19:42:58.3511549Z apicid : 64 2025-05-07T19:42:58.3511770Z initial apicid : 64 2025-05-07T19:42:58.3511981Z fpu : yes 2025-05-07T19:42:58.3512206Z fpu_exception : yes 2025-05-07T19:42:58.3512425Z cpuid level : 13 2025-05-07T19:42:58.3512641Z wp : yes 2025-05-07T19:42:58.3514803Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3517362Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3517947Z bogomips : 6000.01 2025-05-07T19:42:58.3518166Z clflush size : 64 2025-05-07T19:42:58.3518409Z cache_alignment : 64 2025-05-07T19:42:58.3518678Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3519021Z power management: 2025-05-07T19:42:58.3519154Z 2025-05-07T19:42:58.3519272Z processor : 25 2025-05-07T19:42:58.3519498Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3519760Z cpu family : 6 2025-05-07T19:42:58.3519962Z model : 85 2025-05-07T19:42:58.3520257Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3520597Z stepping : 7 2025-05-07T19:42:58.3520828Z microcode : 0x5003901 2025-05-07T19:42:58.3521050Z cpu MHz : 3000.006 2025-05-07T19:42:58.3521343Z cache size : 36608 KB 2025-05-07T19:42:58.3521573Z physical id : 1 2025-05-07T19:42:58.3521817Z siblings : 48 2025-05-07T19:42:58.3522053Z core id : 1 2025-05-07T19:42:58.3522259Z cpu cores : 24 2025-05-07T19:42:58.3522499Z apicid : 66 2025-05-07T19:42:58.3522710Z initial apicid : 66 2025-05-07T19:42:58.3522956Z fpu : yes 2025-05-07T19:42:58.3523161Z fpu_exception : yes 2025-05-07T19:42:58.3523402Z cpuid level : 13 2025-05-07T19:42:58.3523601Z wp : yes 2025-05-07T19:42:58.3525836Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3528373Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3528937Z bogomips : 6000.01 2025-05-07T19:42:58.3529175Z clflush size : 64 2025-05-07T19:42:58.3529391Z cache_alignment : 64 2025-05-07T19:42:58.3529677Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3530024Z power management: 2025-05-07T19:42:58.3530159Z 2025-05-07T19:42:58.3530242Z processor : 26 2025-05-07T19:42:58.3530480Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3530865Z cpu family : 6 2025-05-07T19:42:58.3531092Z model : 85 2025-05-07T19:42:58.3531366Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3531733Z stepping : 7 2025-05-07T19:42:58.3531938Z microcode : 0x5003901 2025-05-07T19:42:58.3532182Z cpu MHz : 3000.006 2025-05-07T19:42:58.3532395Z cache size : 36608 KB 2025-05-07T19:42:58.3532644Z physical id : 1 2025-05-07T19:42:58.3532850Z siblings : 48 2025-05-07T19:42:58.3533072Z core id : 2 2025-05-07T19:42:58.3533288Z cpu cores : 24 2025-05-07T19:42:58.3533491Z apicid : 68 2025-05-07T19:42:58.3533718Z initial apicid : 68 2025-05-07T19:42:58.3533929Z fpu : yes 2025-05-07T19:42:58.3534148Z fpu_exception : yes 2025-05-07T19:42:58.3534365Z cpuid level : 13 2025-05-07T19:42:58.3534592Z wp : yes 2025-05-07T19:42:58.3536770Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3539692Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3540322Z bogomips : 6000.01 2025-05-07T19:42:58.3540552Z clflush size : 64 2025-05-07T19:42:58.3540819Z cache_alignment : 64 2025-05-07T19:42:58.3541112Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3541489Z power management: 2025-05-07T19:42:58.3541633Z 2025-05-07T19:42:58.3541760Z processor : 27 2025-05-07T19:42:58.3541998Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3542272Z cpu family : 6 2025-05-07T19:42:58.3542490Z model : 85 2025-05-07T19:42:58.3542801Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3543173Z stepping : 7 2025-05-07T19:42:58.3543414Z microcode : 0x5003901 2025-05-07T19:42:58.3543659Z cpu MHz : 3000.006 2025-05-07T19:42:58.3543909Z cache size : 36608 KB 2025-05-07T19:42:58.3544137Z physical id : 1 2025-05-07T19:42:58.3544432Z siblings : 48 2025-05-07T19:42:58.3544658Z core id : 3 2025-05-07T19:42:58.3544855Z cpu cores : 24 2025-05-07T19:42:58.3545082Z apicid : 70 2025-05-07T19:42:58.3545284Z initial apicid : 70 2025-05-07T19:42:58.3545513Z fpu : yes 2025-05-07T19:42:58.3545712Z fpu_exception : yes 2025-05-07T19:42:58.3545945Z cpuid level : 13 2025-05-07T19:42:58.3546160Z wp : yes 2025-05-07T19:42:58.3548566Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3551276Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3551826Z bogomips : 6000.01 2025-05-07T19:42:58.3552042Z clflush size : 64 2025-05-07T19:42:58.3552244Z cache_alignment : 64 2025-05-07T19:42:58.3552513Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3552845Z power management: 2025-05-07T19:42:58.3552965Z 2025-05-07T19:42:58.3553044Z processor : 28 2025-05-07T19:42:58.3553269Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3553491Z cpu family : 6 2025-05-07T19:42:58.3553702Z model : 85 2025-05-07T19:42:58.3553951Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3554290Z stepping : 7 2025-05-07T19:42:58.3554484Z microcode : 0x5003901 2025-05-07T19:42:58.3554714Z cpu MHz : 3000.006 2025-05-07T19:42:58.3554919Z cache size : 36608 KB 2025-05-07T19:42:58.3555152Z physical id : 1 2025-05-07T19:42:58.3555346Z siblings : 48 2025-05-07T19:42:58.3555544Z core id : 4 2025-05-07T19:42:58.3555744Z cpu cores : 24 2025-05-07T19:42:58.3555929Z apicid : 72 2025-05-07T19:42:58.3556145Z initial apicid : 72 2025-05-07T19:42:58.3556343Z fpu : yes 2025-05-07T19:42:58.3556548Z fpu_exception : yes 2025-05-07T19:42:58.3556758Z cpuid level : 13 2025-05-07T19:42:58.3556968Z wp : yes 2025-05-07T19:42:58.3559135Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3561696Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3562275Z bogomips : 6000.01 2025-05-07T19:42:58.3562480Z clflush size : 64 2025-05-07T19:42:58.3562713Z cache_alignment : 64 2025-05-07T19:42:58.3562965Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3563295Z power management: 2025-05-07T19:42:58.3563420Z 2025-05-07T19:42:58.3563513Z processor : 29 2025-05-07T19:42:58.3563711Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3563958Z cpu family : 6 2025-05-07T19:42:58.3564149Z model : 85 2025-05-07T19:42:58.3564428Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3564765Z stepping : 7 2025-05-07T19:42:58.3564983Z microcode : 0x5003901 2025-05-07T19:42:58.3565204Z cpu MHz : 3000.006 2025-05-07T19:42:58.3565419Z cache size : 36608 KB 2025-05-07T19:42:58.3565634Z physical id : 1 2025-05-07T19:42:58.3565853Z siblings : 48 2025-05-07T19:42:58.3566066Z core id : 5 2025-05-07T19:42:58.3566253Z cpu cores : 24 2025-05-07T19:42:58.3566524Z apicid : 74 2025-05-07T19:42:58.3566725Z initial apicid : 74 2025-05-07T19:42:58.3566960Z fpu : yes 2025-05-07T19:42:58.3567154Z fpu_exception : yes 2025-05-07T19:42:58.3567509Z cpuid level : 13 2025-05-07T19:42:58.3567880Z wp : yes 2025-05-07T19:42:58.3570367Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3573088Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3573688Z bogomips : 6000.01 2025-05-07T19:42:58.3573936Z clflush size : 64 2025-05-07T19:42:58.3574164Z cache_alignment : 64 2025-05-07T19:42:58.3574450Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3574804Z power management: 2025-05-07T19:42:58.3574944Z 2025-05-07T19:42:58.3575030Z processor : 30 2025-05-07T19:42:58.3575272Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3575511Z cpu family : 6 2025-05-07T19:42:58.3575706Z model : 85 2025-05-07T19:42:58.3575980Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3576341Z stepping : 7 2025-05-07T19:42:58.3576544Z microcode : 0x5003901 2025-05-07T19:42:58.3576785Z cpu MHz : 1200.271 2025-05-07T19:42:58.3577063Z cache size : 36608 KB 2025-05-07T19:42:58.3577306Z physical id : 1 2025-05-07T19:42:58.3577510Z siblings : 48 2025-05-07T19:42:58.3577730Z core id : 6 2025-05-07T19:42:58.3577946Z cpu cores : 24 2025-05-07T19:42:58.3578150Z apicid : 76 2025-05-07T19:42:58.3578373Z initial apicid : 76 2025-05-07T19:42:58.3578590Z fpu : yes 2025-05-07T19:42:58.3578806Z fpu_exception : yes 2025-05-07T19:42:58.3579021Z cpuid level : 13 2025-05-07T19:42:58.3579237Z wp : yes 2025-05-07T19:42:58.3581573Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3584280Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3584878Z bogomips : 6000.01 2025-05-07T19:42:58.3585098Z clflush size : 64 2025-05-07T19:42:58.3585311Z cache_alignment : 64 2025-05-07T19:42:58.3585580Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3585904Z power management: 2025-05-07T19:42:58.3586031Z 2025-05-07T19:42:58.3586117Z processor : 31 2025-05-07T19:42:58.3586319Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3586553Z cpu family : 6 2025-05-07T19:42:58.3586742Z model : 85 2025-05-07T19:42:58.3587022Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3587375Z stepping : 7 2025-05-07T19:42:58.3587611Z microcode : 0x5003901 2025-05-07T19:42:58.3587834Z cpu MHz : 1199.743 2025-05-07T19:42:58.3588063Z cache size : 36608 KB 2025-05-07T19:42:58.3588291Z physical id : 1 2025-05-07T19:42:58.3588523Z siblings : 48 2025-05-07T19:42:58.3588729Z core id : 7 2025-05-07T19:42:58.3588922Z cpu cores : 24 2025-05-07T19:42:58.3589250Z apicid : 78 2025-05-07T19:42:58.3589549Z initial apicid : 78 2025-05-07T19:42:58.3589858Z fpu : yes 2025-05-07T19:42:58.3590040Z fpu_exception : yes 2025-05-07T19:42:58.3590273Z cpuid level : 13 2025-05-07T19:42:58.3590461Z wp : yes 2025-05-07T19:42:58.3592681Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3595172Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3595708Z bogomips : 6000.01 2025-05-07T19:42:58.3595914Z clflush size : 64 2025-05-07T19:42:58.3596119Z cache_alignment : 64 2025-05-07T19:42:58.3596370Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3596677Z power management: 2025-05-07T19:42:58.3596797Z 2025-05-07T19:42:58.3596867Z processor : 32 2025-05-07T19:42:58.3597068Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3597279Z cpu family : 6 2025-05-07T19:42:58.3597464Z model : 85 2025-05-07T19:42:58.3597699Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3598020Z stepping : 7 2025-05-07T19:42:58.3598201Z microcode : 0x5003901 2025-05-07T19:42:58.3598407Z cpu MHz : 1200.334 2025-05-07T19:42:58.3598594Z cache size : 36608 KB 2025-05-07T19:42:58.3598829Z physical id : 1 2025-05-07T19:42:58.3599021Z siblings : 48 2025-05-07T19:42:58.3599229Z core id : 8 2025-05-07T19:42:58.3599430Z cpu cores : 24 2025-05-07T19:42:58.3599619Z apicid : 80 2025-05-07T19:42:58.3599825Z initial apicid : 80 2025-05-07T19:42:58.3600025Z fpu : yes 2025-05-07T19:42:58.3600229Z fpu_exception : yes 2025-05-07T19:42:58.3600432Z cpuid level : 13 2025-05-07T19:42:58.3600631Z wp : yes 2025-05-07T19:42:58.3602780Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3605271Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3605816Z bogomips : 6000.01 2025-05-07T19:42:58.3606003Z clflush size : 64 2025-05-07T19:42:58.3606203Z cache_alignment : 64 2025-05-07T19:42:58.3606445Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3606743Z power management: 2025-05-07T19:42:58.3606862Z 2025-05-07T19:42:58.3606944Z processor : 33 2025-05-07T19:42:58.3607135Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3607344Z cpu family : 6 2025-05-07T19:42:58.3607515Z model : 85 2025-05-07T19:42:58.3607761Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3608070Z stepping : 7 2025-05-07T19:42:58.3608255Z microcode : 0x5003901 2025-05-07T19:42:58.3608451Z cpu MHz : 1199.427 2025-05-07T19:42:58.3608645Z cache size : 36608 KB 2025-05-07T19:42:58.3608844Z physical id : 1 2025-05-07T19:42:58.3609043Z siblings : 48 2025-05-07T19:42:58.3609228Z core id : 9 2025-05-07T19:42:58.3609414Z cpu cores : 24 2025-05-07T19:42:58.3609604Z apicid : 82 2025-05-07T19:42:58.3609799Z initial apicid : 82 2025-05-07T19:42:58.3610001Z fpu : yes 2025-05-07T19:42:58.3610185Z fpu_exception : yes 2025-05-07T19:42:58.3610445Z cpuid level : 13 2025-05-07T19:42:58.3610641Z wp : yes 2025-05-07T19:42:58.3612824Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3615372Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3615921Z bogomips : 6000.01 2025-05-07T19:42:58.3616132Z clflush size : 64 2025-05-07T19:42:58.3616338Z cache_alignment : 64 2025-05-07T19:42:58.3616622Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3616944Z power management: 2025-05-07T19:42:58.3617131Z 2025-05-07T19:42:58.3617213Z processor : 34 2025-05-07T19:42:58.3617597Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3617840Z cpu family : 6 2025-05-07T19:42:58.3618064Z model : 85 2025-05-07T19:42:58.3618387Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3618761Z stepping : 7 2025-05-07T19:42:58.3618963Z microcode : 0x5003901 2025-05-07T19:42:58.3619203Z cpu MHz : 3000.006 2025-05-07T19:42:58.3619420Z cache size : 36608 KB 2025-05-07T19:42:58.3619653Z physical id : 1 2025-05-07T19:42:58.3619865Z siblings : 48 2025-05-07T19:42:58.3620080Z core id : 10 2025-05-07T19:42:58.3620274Z cpu cores : 24 2025-05-07T19:42:58.3620465Z apicid : 84 2025-05-07T19:42:58.3620684Z initial apicid : 84 2025-05-07T19:42:58.3620906Z fpu : yes 2025-05-07T19:42:58.3621127Z fpu_exception : yes 2025-05-07T19:42:58.3621355Z cpuid level : 13 2025-05-07T19:42:58.3621453Z wp : yes 2025-05-07T19:42:58.3623671Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3624075Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3624179Z bogomips : 6000.01 2025-05-07T19:42:58.3624265Z clflush size : 64 2025-05-07T19:42:58.3624356Z cache_alignment : 64 2025-05-07T19:42:58.3624506Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3624594Z power management: 2025-05-07T19:42:58.3624599Z 2025-05-07T19:42:58.3624685Z processor : 35 2025-05-07T19:42:58.3624782Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3624877Z cpu family : 6 2025-05-07T19:42:58.3624958Z model : 85 2025-05-07T19:42:58.3625128Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3625227Z stepping : 7 2025-05-07T19:42:58.3625314Z microcode : 0x5003901 2025-05-07T19:42:58.3625394Z cpu MHz : 1200.271 2025-05-07T19:42:58.3625485Z cache size : 36608 KB 2025-05-07T19:42:58.3625581Z physical id : 1 2025-05-07T19:42:58.3625662Z siblings : 48 2025-05-07T19:42:58.3625742Z core id : 11 2025-05-07T19:42:58.3625835Z cpu cores : 24 2025-05-07T19:42:58.3625914Z apicid : 86 2025-05-07T19:42:58.3626005Z initial apicid : 86 2025-05-07T19:42:58.3626085Z fpu : yes 2025-05-07T19:42:58.3626187Z fpu_exception : yes 2025-05-07T19:42:58.3626269Z cpuid level : 13 2025-05-07T19:42:58.3626345Z wp : yes 2025-05-07T19:42:58.3628573Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3629027Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3629110Z bogomips : 6000.01 2025-05-07T19:42:58.3629276Z clflush size : 64 2025-05-07T19:42:58.3629364Z cache_alignment : 64 2025-05-07T19:42:58.3629499Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3629604Z power management: 2025-05-07T19:42:58.3629612Z 2025-05-07T19:42:58.3629807Z processor : 36 2025-05-07T19:42:58.3629894Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3629972Z cpu family : 6 2025-05-07T19:42:58.3630062Z model : 85 2025-05-07T19:42:58.3630218Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3630299Z stepping : 7 2025-05-07T19:42:58.3630395Z microcode : 0x5003901 2025-05-07T19:42:58.3630476Z cpu MHz : 3000.006 2025-05-07T19:42:58.3630558Z cache size : 36608 KB 2025-05-07T19:42:58.3630637Z physical id : 1 2025-05-07T19:42:58.3630729Z siblings : 48 2025-05-07T19:42:58.3630805Z core id : 12 2025-05-07T19:42:58.3630882Z cpu cores : 24 2025-05-07T19:42:58.3630974Z apicid : 88 2025-05-07T19:42:58.3631055Z initial apicid : 88 2025-05-07T19:42:58.3631131Z fpu : yes 2025-05-07T19:42:58.3631216Z fpu_exception : yes 2025-05-07T19:42:58.3631308Z cpuid level : 13 2025-05-07T19:42:58.3631381Z wp : yes 2025-05-07T19:42:58.3633573Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3633956Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3634037Z bogomips : 6000.01 2025-05-07T19:42:58.3634118Z clflush size : 64 2025-05-07T19:42:58.3634218Z cache_alignment : 64 2025-05-07T19:42:58.3634342Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3634424Z power management: 2025-05-07T19:42:58.3634428Z 2025-05-07T19:42:58.3634525Z processor : 37 2025-05-07T19:42:58.3634610Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3634688Z cpu family : 6 2025-05-07T19:42:58.3634764Z model : 85 2025-05-07T19:42:58.3634930Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3635005Z stepping : 7 2025-05-07T19:42:58.3635085Z microcode : 0x5003901 2025-05-07T19:42:58.3635180Z cpu MHz : 3000.006 2025-05-07T19:42:58.3635258Z cache size : 36608 KB 2025-05-07T19:42:58.3635335Z physical id : 1 2025-05-07T19:42:58.3635409Z siblings : 48 2025-05-07T19:42:58.3635496Z core id : 13 2025-05-07T19:42:58.3635571Z cpu cores : 24 2025-05-07T19:42:58.3635644Z apicid : 90 2025-05-07T19:42:58.3635723Z initial apicid : 90 2025-05-07T19:42:58.3635806Z fpu : yes 2025-05-07T19:42:58.3635885Z fpu_exception : yes 2025-05-07T19:42:58.3635964Z cpuid level : 13 2025-05-07T19:42:58.3636049Z wp : yes 2025-05-07T19:42:58.3638099Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3638518Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3638608Z bogomips : 6000.01 2025-05-07T19:42:58.3638686Z clflush size : 64 2025-05-07T19:42:58.3638766Z cache_alignment : 64 2025-05-07T19:42:58.3640528Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3640676Z power management: 2025-05-07T19:42:58.3640680Z 2025-05-07T19:42:58.3640758Z processor : 38 2025-05-07T19:42:58.3640858Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3640939Z cpu family : 6 2025-05-07T19:42:58.3641013Z model : 85 2025-05-07T19:42:58.3641166Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3641257Z stepping : 7 2025-05-07T19:42:58.3641339Z microcode : 0x5003901 2025-05-07T19:42:58.3641416Z cpu MHz : 3000.006 2025-05-07T19:42:58.3641497Z cache size : 36608 KB 2025-05-07T19:42:58.3641585Z physical id : 1 2025-05-07T19:42:58.3641660Z siblings : 48 2025-05-07T19:42:58.3641735Z core id : 14 2025-05-07T19:42:58.3641823Z cpu cores : 24 2025-05-07T19:42:58.3641898Z apicid : 92 2025-05-07T19:42:58.3641978Z initial apicid : 92 2025-05-07T19:42:58.3642050Z fpu : yes 2025-05-07T19:42:58.3642142Z fpu_exception : yes 2025-05-07T19:42:58.3642219Z cpuid level : 13 2025-05-07T19:42:58.3642295Z wp : yes 2025-05-07T19:42:58.3644359Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3644724Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3644804Z bogomips : 6000.01 2025-05-07T19:42:58.3645255Z lspci: Unable to load libkmod resources: error -2 2025-05-07T19:42:58.3645333Z clflush size : 64 2025-05-07T19:42:58.3645417Z cache_alignment : 64 2025-05-07T19:42:58.3645556Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3645638Z power management: 2025-05-07T19:42:58.3645642Z 2025-05-07T19:42:58.3645723Z processor : 39 2025-05-07T19:42:58.3645827Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3645904Z cpu family : 6 2025-05-07T19:42:58.3645978Z model : 85 2025-05-07T19:42:58.3646131Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3646222Z stepping : 7 2025-05-07T19:42:58.3646303Z microcode : 0x5003901 2025-05-07T19:42:58.3646380Z cpu MHz : 3000.006 2025-05-07T19:42:58.3646472Z cache size : 36608 KB 2025-05-07T19:42:58.3646550Z physical id : 1 2025-05-07T19:42:58.3646625Z siblings : 48 2025-05-07T19:42:58.3646698Z core id : 15 2025-05-07T19:42:58.3646784Z cpu cores : 24 2025-05-07T19:42:58.3646858Z apicid : 94 2025-05-07T19:42:58.3646937Z initial apicid : 94 2025-05-07T19:42:58.3647010Z fpu : yes 2025-05-07T19:42:58.3647102Z fpu_exception : yes 2025-05-07T19:42:58.3647181Z cpuid level : 13 2025-05-07T19:42:58.3647254Z wp : yes 2025-05-07T19:42:58.3649312Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3649732Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3649823Z bogomips : 6000.01 2025-05-07T19:42:58.3649901Z clflush size : 64 2025-05-07T19:42:58.3649982Z cache_alignment : 64 2025-05-07T19:42:58.3650148Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3650244Z power management: 2025-05-07T19:42:58.3650248Z 2025-05-07T19:42:58.3650326Z processor : 40 2025-05-07T19:42:58.3650411Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3650503Z cpu family : 6 2025-05-07T19:42:58.3650575Z model : 85 2025-05-07T19:42:58.3650726Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3650801Z stepping : 7 2025-05-07T19:42:58.3650894Z microcode : 0x5003901 2025-05-07T19:42:58.3650971Z cpu MHz : 3000.006 2025-05-07T19:42:58.3651049Z cache size : 36608 KB 2025-05-07T19:42:58.3651140Z physical id : 1 2025-05-07T19:42:58.3651215Z siblings : 48 2025-05-07T19:42:58.3651289Z core id : 16 2025-05-07T19:42:58.3651366Z cpu cores : 24 2025-05-07T19:42:58.3651453Z apicid : 96 2025-05-07T19:42:58.3651533Z initial apicid : 96 2025-05-07T19:42:58.3651605Z fpu : yes 2025-05-07T19:42:58.3651687Z fpu_exception : yes 2025-05-07T19:42:58.3651777Z cpuid level : 13 2025-05-07T19:42:58.3651850Z wp : yes 2025-05-07T19:42:58.3653893Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3654272Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3654350Z bogomips : 6000.01 2025-05-07T19:42:58.3654427Z clflush size : 64 2025-05-07T19:42:58.3654519Z cache_alignment : 64 2025-05-07T19:42:58.3654641Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3654724Z power management: 2025-05-07T19:42:58.3654728Z 2025-05-07T19:42:58.3654817Z processor : 41 2025-05-07T19:42:58.3654901Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3654976Z cpu family : 6 2025-05-07T19:42:58.3655051Z model : 85 2025-05-07T19:42:58.3655215Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3655290Z stepping : 7 2025-05-07T19:42:58.3655370Z microcode : 0x5003901 2025-05-07T19:42:58.3655458Z cpu MHz : 3000.006 2025-05-07T19:42:58.3655539Z cache size : 36608 KB 2025-05-07T19:42:58.3655616Z physical id : 1 2025-05-07T19:42:58.3655691Z siblings : 48 2025-05-07T19:42:58.3655776Z core id : 17 2025-05-07T19:42:58.3655851Z cpu cores : 24 2025-05-07T19:42:58.3655925Z apicid : 98 2025-05-07T19:42:58.3656017Z initial apicid : 98 2025-05-07T19:42:58.3656090Z fpu : yes 2025-05-07T19:42:58.3656171Z fpu_exception : yes 2025-05-07T19:42:58.3656248Z cpuid level : 13 2025-05-07T19:42:58.3656336Z wp : yes 2025-05-07T19:42:58.3658720Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3659221Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3659308Z bogomips : 6000.01 2025-05-07T19:42:58.3659392Z clflush size : 64 2025-05-07T19:42:58.3659480Z cache_alignment : 64 2025-05-07T19:42:58.3659626Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3659713Z power management: 2025-05-07T19:42:58.3659762Z 2025-05-07T19:42:58.3659848Z processor : 42 2025-05-07T19:42:58.3659955Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3660036Z cpu family : 6 2025-05-07T19:42:58.3660114Z model : 85 2025-05-07T19:42:58.3660279Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3660373Z stepping : 7 2025-05-07T19:42:58.3660460Z microcode : 0x5003901 2025-05-07T19:42:58.3660541Z cpu MHz : 3000.006 2025-05-07T19:42:58.3660637Z cache size : 36608 KB 2025-05-07T19:42:58.3660720Z physical id : 1 2025-05-07T19:42:58.3660800Z siblings : 48 2025-05-07T19:42:58.3660879Z core id : 18 2025-05-07T19:42:58.3660974Z cpu cores : 24 2025-05-07T19:42:58.3661054Z apicid : 100 2025-05-07T19:42:58.3661141Z initial apicid : 100 2025-05-07T19:42:58.3661235Z fpu : yes 2025-05-07T19:42:58.3661323Z fpu_exception : yes 2025-05-07T19:42:58.3661406Z cpuid level : 13 2025-05-07T19:42:58.3661487Z wp : yes 2025-05-07T19:42:58.3663726Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3664127Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3664227Z bogomips : 6000.01 2025-05-07T19:42:58.3664310Z clflush size : 64 2025-05-07T19:42:58.3664398Z cache_alignment : 64 2025-05-07T19:42:58.3664532Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3664634Z power management: 2025-05-07T19:42:58.3664639Z 2025-05-07T19:42:58.3664724Z processor : 43 2025-05-07T19:42:58.3664818Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3664916Z cpu family : 6 2025-05-07T19:42:58.3664998Z model : 85 2025-05-07T19:42:58.3665162Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3665248Z stepping : 7 2025-05-07T19:42:58.3665349Z microcode : 0x5003901 2025-05-07T19:42:58.3665432Z cpu MHz : 3000.006 2025-05-07T19:42:58.3665518Z cache size : 36608 KB 2025-05-07T19:42:58.3665618Z physical id : 1 2025-05-07T19:42:58.3665698Z siblings : 48 2025-05-07T19:42:58.3665780Z core id : 19 2025-05-07T19:42:58.3665862Z cpu cores : 24 2025-05-07T19:42:58.3665960Z apicid : 102 2025-05-07T19:42:58.3666052Z initial apicid : 102 2025-05-07T19:42:58.3666132Z fpu : yes 2025-05-07T19:42:58.3666235Z fpu_exception : yes 2025-05-07T19:42:58.3666322Z cpuid level : 13 2025-05-07T19:42:58.3666403Z wp : yes 2025-05-07T19:42:58.3668765Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3669261Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3669347Z bogomips : 6000.01 2025-05-07T19:42:58.3669446Z clflush size : 64 2025-05-07T19:42:58.3669533Z cache_alignment : 64 2025-05-07T19:42:58.3669666Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3669755Z power management: 2025-05-07T19:42:58.3669759Z 2025-05-07T19:42:58.3669855Z processor : 44 2025-05-07T19:42:58.3670012Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3670096Z cpu family : 6 2025-05-07T19:42:58.3670189Z model : 85 2025-05-07T19:42:58.3670356Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3670441Z stepping : 7 2025-05-07T19:42:58.3670532Z microcode : 0x5003901 2025-05-07T19:42:58.3670632Z cpu MHz : 3000.006 2025-05-07T19:42:58.3670720Z cache size : 36608 KB 2025-05-07T19:42:58.3670804Z physical id : 1 2025-05-07T19:42:58.3670901Z siblings : 48 2025-05-07T19:42:58.3670982Z core id : 20 2025-05-07T19:42:58.3671068Z cpu cores : 24 2025-05-07T19:42:58.3671151Z apicid : 104 2025-05-07T19:42:58.3671249Z initial apicid : 104 2025-05-07T19:42:58.3671332Z fpu : yes 2025-05-07T19:42:58.3671419Z fpu_exception : yes 2025-05-07T19:42:58.3671516Z cpuid level : 13 2025-05-07T19:42:58.3671594Z wp : yes 2025-05-07T19:42:58.3673800Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3674208Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3674291Z bogomips : 6000.01 2025-05-07T19:42:58.3674374Z clflush size : 64 2025-05-07T19:42:58.3674472Z cache_alignment : 64 2025-05-07T19:42:58.3674605Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3674690Z power management: 2025-05-07T19:42:58.3674695Z 2025-05-07T19:42:58.3674777Z processor : 45 2025-05-07T19:42:58.3674878Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3674962Z cpu family : 6 2025-05-07T19:42:58.3675040Z model : 85 2025-05-07T19:42:58.3675215Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3675296Z stepping : 7 2025-05-07T19:42:58.3675381Z microcode : 0x5003901 2025-05-07T19:42:58.3675466Z cpu MHz : 3000.006 2025-05-07T19:42:58.3675560Z cache size : 36608 KB 2025-05-07T19:42:58.3675643Z physical id : 1 2025-05-07T19:42:58.3675723Z siblings : 48 2025-05-07T19:42:58.3675815Z core id : 21 2025-05-07T19:42:58.3675896Z cpu cores : 24 2025-05-07T19:42:58.3675974Z apicid : 106 2025-05-07T19:42:58.3676062Z initial apicid : 106 2025-05-07T19:42:58.3676154Z fpu : yes 2025-05-07T19:42:58.3676240Z fpu_exception : yes 2025-05-07T19:42:58.3676323Z cpuid level : 13 2025-05-07T19:42:58.3676403Z wp : yes 2025-05-07T19:42:58.3678649Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3679191Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3679286Z bogomips : 6000.01 2025-05-07T19:42:58.3679366Z clflush size : 64 2025-05-07T19:42:58.3679449Z cache_alignment : 64 2025-05-07T19:42:58.3679586Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3679667Z power management: 2025-05-07T19:42:58.3679671Z 2025-05-07T19:42:58.3679748Z processor : 46 2025-05-07T19:42:58.3679832Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3679917Z cpu family : 6 2025-05-07T19:42:58.3694217Z model : 85 2025-05-07T19:42:58.3694573Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3694657Z stepping : 7 2025-05-07T19:42:58.3694803Z microcode : 0x5003901 2025-05-07T19:42:58.3694878Z cpu MHz : 3000.006 2025-05-07T19:42:58.3694962Z cache size : 36608 KB 2025-05-07T19:42:58.3695038Z physical id : 1 2025-05-07T19:42:58.3695117Z siblings : 48 2025-05-07T19:42:58.3695188Z core id : 22 2025-05-07T19:42:58.3695260Z cpu cores : 24 2025-05-07T19:42:58.3695336Z apicid : 108 2025-05-07T19:42:58.3695411Z initial apicid : 108 2025-05-07T19:42:58.3695480Z fpu : yes 2025-05-07T19:42:58.3695561Z fpu_exception : yes 2025-05-07T19:42:58.3695638Z cpuid level : 13 2025-05-07T19:42:58.3695708Z wp : yes 2025-05-07T19:42:58.3698069Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3698483Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3698567Z bogomips : 6000.01 2025-05-07T19:42:58.3698646Z clflush size : 64 2025-05-07T19:42:58.3698736Z cache_alignment : 64 2025-05-07T19:42:58.3698868Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3698949Z power management: 2025-05-07T19:42:58.3698955Z 2025-05-07T19:42:58.3699047Z processor : 47 2025-05-07T19:42:58.3699139Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3699216Z cpu family : 6 2025-05-07T19:42:58.3699293Z model : 85 2025-05-07T19:42:58.3699474Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3699551Z stepping : 7 2025-05-07T19:42:58.3699633Z microcode : 0x5003901 2025-05-07T19:42:58.3699721Z cpu MHz : 1199.565 2025-05-07T19:42:58.3699802Z cache size : 36608 KB 2025-05-07T19:42:58.3699883Z physical id : 1 2025-05-07T19:42:58.3699959Z siblings : 48 2025-05-07T19:42:58.3700045Z core id : 23 2025-05-07T19:42:58.3700121Z cpu cores : 24 2025-05-07T19:42:58.3700198Z apicid : 110 2025-05-07T19:42:58.3700291Z initial apicid : 110 2025-05-07T19:42:58.3700366Z fpu : yes 2025-05-07T19:42:58.3700447Z fpu_exception : yes 2025-05-07T19:42:58.3700525Z cpuid level : 13 2025-05-07T19:42:58.3700611Z wp : yes 2025-05-07T19:42:58.3702832Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3703294Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3703378Z bogomips : 6000.01 2025-05-07T19:42:58.3703457Z clflush size : 64 2025-05-07T19:42:58.3703537Z cache_alignment : 64 2025-05-07T19:42:58.3703674Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3703758Z power management: 2025-05-07T19:42:58.3703762Z 2025-05-07T19:42:58.3703842Z processor : 48 2025-05-07T19:42:58.3703936Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3704019Z cpu family : 6 2025-05-07T19:42:58.3704095Z model : 85 2025-05-07T19:42:58.3704256Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3704398Z stepping : 7 2025-05-07T19:42:58.3704484Z microcode : 0x5003901 2025-05-07T19:42:58.3704560Z cpu MHz : 3000.006 2025-05-07T19:42:58.3704654Z cache size : 36608 KB 2025-05-07T19:42:58.3704734Z physical id : 0 2025-05-07T19:42:58.3704816Z siblings : 48 2025-05-07T19:42:58.3704888Z core id : 0 2025-05-07T19:42:58.3704976Z cpu cores : 24 2025-05-07T19:42:58.3705053Z apicid : 1 2025-05-07T19:42:58.3705136Z initial apicid : 1 2025-05-07T19:42:58.3705221Z fpu : yes 2025-05-07T19:42:58.3705307Z fpu_exception : yes 2025-05-07T19:42:58.3705385Z cpuid level : 13 2025-05-07T19:42:58.3705459Z wp : yes 2025-05-07T19:42:58.3707698Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3708091Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3708181Z bogomips : 6000.01 2025-05-07T19:42:58.3708261Z clflush size : 64 2025-05-07T19:42:58.3708343Z cache_alignment : 64 2025-05-07T19:42:58.3708475Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3708563Z power management: 2025-05-07T19:42:58.3708568Z 2025-05-07T19:42:58.3708647Z processor : 49 2025-05-07T19:42:58.3708736Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3708821Z cpu family : 6 2025-05-07T19:42:58.3708899Z model : 85 2025-05-07T19:42:58.3709058Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3709136Z stepping : 7 2025-05-07T19:42:58.3709230Z microcode : 0x5003901 2025-05-07T19:42:58.3709313Z cpu MHz : 3000.006 2025-05-07T19:42:58.3709396Z cache size : 36608 KB 2025-05-07T19:42:58.3709488Z physical id : 0 2025-05-07T19:42:58.3709563Z siblings : 48 2025-05-07T19:42:58.3709749Z core id : 1 2025-05-07T19:42:58.3709821Z cpu cores : 24 2025-05-07T19:42:58.3709900Z apicid : 3 2025-05-07T19:42:58.3709977Z initial apicid : 3 2025-05-07T19:42:58.3710047Z fpu : yes 2025-05-07T19:42:58.3710126Z fpu_exception : yes 2025-05-07T19:42:58.3710208Z cpuid level : 13 2025-05-07T19:42:58.3710277Z wp : yes 2025-05-07T19:42:58.3712319Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3712691Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3712812Z bogomips : 6000.01 2025-05-07T19:42:58.3712897Z clflush size : 64 2025-05-07T19:42:58.3712975Z cache_alignment : 64 2025-05-07T19:42:58.3713092Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3713168Z power management: 2025-05-07T19:42:58.3713172Z 2025-05-07T19:42:58.3713253Z processor : 50 2025-05-07T19:42:58.3713335Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3713405Z cpu family : 6 2025-05-07T19:42:58.3713482Z model : 85 2025-05-07T19:42:58.3713628Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3713698Z stepping : 7 2025-05-07T19:42:58.3713774Z microcode : 0x5003901 2025-05-07T19:42:58.3713898Z cpu MHz : 3275.834 2025-05-07T19:42:58.3713972Z cache size : 36608 KB 2025-05-07T19:42:58.3714044Z physical id : 0 2025-05-07T19:42:58.3714122Z siblings : 48 2025-05-07T19:42:58.3714189Z core id : 2 2025-05-07T19:42:58.3714261Z cpu cores : 24 2025-05-07T19:42:58.3714329Z apicid : 5 2025-05-07T19:42:58.3714413Z initial apicid : 5 2025-05-07T19:42:58.3714478Z fpu : yes 2025-05-07T19:42:58.3714550Z fpu_exception : yes 2025-05-07T19:42:58.3714621Z cpuid level : 13 2025-05-07T19:42:58.3714697Z wp : yes 2025-05-07T19:42:58.3716753Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3717123Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3717202Z bogomips : 6000.01 2025-05-07T19:42:58.3717278Z clflush size : 64 2025-05-07T19:42:58.3717356Z cache_alignment : 64 2025-05-07T19:42:58.3717483Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3717562Z power management: 2025-05-07T19:42:58.3717567Z 2025-05-07T19:42:58.3717641Z processor : 51 2025-05-07T19:42:58.3717731Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3717803Z cpu family : 6 2025-05-07T19:42:58.3717874Z model : 85 2025-05-07T19:42:58.3718032Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3718104Z stepping : 7 2025-05-07T19:42:58.3718181Z microcode : 0x5003901 2025-05-07T19:42:58.3718255Z cpu MHz : 3000.006 2025-05-07T19:42:58.3718344Z cache size : 36608 KB 2025-05-07T19:42:58.3718417Z physical id : 0 2025-05-07T19:42:58.3718489Z siblings : 48 2025-05-07T19:42:58.3718561Z core id : 3 2025-05-07T19:42:58.3718644Z cpu cores : 24 2025-05-07T19:42:58.3718715Z apicid : 7 2025-05-07T19:42:58.3718794Z initial apicid : 7 2025-05-07T19:42:58.3718875Z fpu : yes 2025-05-07T19:42:58.3718952Z fpu_exception : yes 2025-05-07T19:42:58.3719022Z cpuid level : 13 2025-05-07T19:42:58.3719090Z wp : yes 2025-05-07T19:42:58.3721159Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3721522Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3721667Z bogomips : 6000.01 2025-05-07T19:42:58.3721742Z clflush size : 64 2025-05-07T19:42:58.3721818Z cache_alignment : 64 2025-05-07T19:42:58.3721935Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3722019Z power management: 2025-05-07T19:42:58.3722023Z 2025-05-07T19:42:58.3722095Z processor : 52 2025-05-07T19:42:58.3722172Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3722250Z cpu family : 6 2025-05-07T19:42:58.3722318Z model : 85 2025-05-07T19:42:58.3722464Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3722538Z stepping : 7 2025-05-07T19:42:58.3722624Z microcode : 0x5003901 2025-05-07T19:42:58.3722695Z cpu MHz : 3385.279 2025-05-07T19:42:58.3722772Z cache size : 36608 KB 2025-05-07T19:42:58.3722896Z physical id : 0 2025-05-07T19:42:58.3722968Z siblings : 48 2025-05-07T19:42:58.3723044Z core id : 4 2025-05-07T19:42:58.3723118Z cpu cores : 24 2025-05-07T19:42:58.3723211Z apicid : 9 2025-05-07T19:42:58.3723293Z initial apicid : 9 2025-05-07T19:42:58.3723372Z fpu : yes 2025-05-07T19:42:58.3723456Z fpu_exception : yes 2025-05-07T19:42:58.3723528Z cpuid level : 13 2025-05-07T19:42:58.3723598Z wp : yes 2025-05-07T19:42:58.3725672Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3726035Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3726109Z bogomips : 6000.01 2025-05-07T19:42:58.3726197Z clflush size : 64 2025-05-07T19:42:58.3726278Z cache_alignment : 64 2025-05-07T19:42:58.3726395Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3726470Z power management: 2025-05-07T19:42:58.3726474Z 2025-05-07T19:42:58.3726557Z processor : 53 2025-05-07T19:42:58.3726642Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3726714Z cpu family : 6 2025-05-07T19:42:58.3726794Z model : 85 2025-05-07T19:42:58.3726941Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3727014Z stepping : 7 2025-05-07T19:42:58.3727092Z microcode : 0x5003901 2025-05-07T19:42:58.3727176Z cpu MHz : 3000.006 2025-05-07T19:42:58.3727252Z cache size : 36608 KB 2025-05-07T19:42:58.3727325Z physical id : 0 2025-05-07T19:42:58.3727407Z siblings : 48 2025-05-07T19:42:58.3727478Z core id : 5 2025-05-07T19:42:58.3727552Z cpu cores : 24 2025-05-07T19:42:58.3727621Z apicid : 11 2025-05-07T19:42:58.3727708Z initial apicid : 11 2025-05-07T19:42:58.3727780Z fpu : yes 2025-05-07T19:42:58.3727860Z fpu_exception : yes 2025-05-07T19:42:58.3727945Z cpuid level : 13 2025-05-07T19:42:58.3728014Z wp : yes 2025-05-07T19:42:58.3730055Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3730427Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3730501Z bogomips : 6000.01 2025-05-07T19:42:58.3730574Z clflush size : 64 2025-05-07T19:42:58.3730716Z cache_alignment : 64 2025-05-07T19:42:58.3730833Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3730908Z power management: 2025-05-07T19:42:58.3730912Z 2025-05-07T19:42:58.3730982Z processor : 54 2025-05-07T19:42:58.3731069Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3731139Z cpu family : 6 2025-05-07T19:42:58.3731209Z model : 85 2025-05-07T19:42:58.3731364Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3731433Z stepping : 7 2025-05-07T19:42:58.3731515Z microcode : 0x5003901 2025-05-07T19:42:58.3731588Z cpu MHz : 3000.006 2025-05-07T19:42:58.3731672Z cache size : 36608 KB 2025-05-07T19:42:58.3731746Z physical id : 0 2025-05-07T19:42:58.3731819Z siblings : 48 2025-05-07T19:42:58.3731896Z core id : 6 2025-05-07T19:42:58.3732008Z cpu cores : 24 2025-05-07T19:42:58.3732078Z apicid : 13 2025-05-07T19:42:58.3732152Z initial apicid : 13 2025-05-07T19:42:58.3732228Z fpu : yes 2025-05-07T19:42:58.3732303Z fpu_exception : yes 2025-05-07T19:42:58.3732380Z cpuid level : 13 2025-05-07T19:42:58.3732459Z wp : yes 2025-05-07T19:42:58.3734497Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3734861Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3734943Z bogomips : 6000.01 2025-05-07T19:42:58.3735016Z clflush size : 64 2025-05-07T19:42:58.3735091Z cache_alignment : 64 2025-05-07T19:42:58.3735218Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3735293Z power management: 2025-05-07T19:42:58.3735297Z 2025-05-07T19:42:58.3735368Z processor : 55 2025-05-07T19:42:58.3735447Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3735525Z cpu family : 6 2025-05-07T19:42:58.3735592Z model : 85 2025-05-07T19:42:58.3735734Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3735812Z stepping : 7 2025-05-07T19:42:58.3735887Z microcode : 0x5003901 2025-05-07T19:42:58.3735957Z cpu MHz : 3000.006 2025-05-07T19:42:58.3736029Z cache size : 36608 KB 2025-05-07T19:42:58.3736115Z physical id : 0 2025-05-07T19:42:58.3736185Z siblings : 48 2025-05-07T19:42:58.3736253Z core id : 7 2025-05-07T19:42:58.3736331Z cpu cores : 24 2025-05-07T19:42:58.3736407Z apicid : 15 2025-05-07T19:42:58.3736481Z initial apicid : 15 2025-05-07T19:42:58.3736551Z fpu : yes 2025-05-07T19:42:58.3736638Z fpu_exception : yes 2025-05-07T19:42:58.3736713Z cpuid level : 13 2025-05-07T19:42:58.3736785Z wp : yes 2025-05-07T19:42:58.3739196Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3739595Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3739681Z bogomips : 6000.01 2025-05-07T19:42:58.3739768Z clflush size : 64 2025-05-07T19:42:58.3739852Z cache_alignment : 64 2025-05-07T19:42:58.3739981Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3740123Z power management: 2025-05-07T19:42:58.3740127Z 2025-05-07T19:42:58.3740204Z processor : 56 2025-05-07T19:42:58.3740292Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3740371Z cpu family : 6 2025-05-07T19:42:58.3740450Z model : 85 2025-05-07T19:42:58.3740610Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3740687Z stepping : 7 2025-05-07T19:42:58.3740774Z microcode : 0x5003901 2025-05-07T19:42:58.3740851Z cpu MHz : 3276.658 2025-05-07T19:42:58.3740930Z cache size : 36608 KB 2025-05-07T19:42:58.3741006Z physical id : 0 2025-05-07T19:42:58.3741089Z siblings : 48 2025-05-07T19:42:58.3741164Z core id : 8 2025-05-07T19:42:58.3741240Z cpu cores : 24 2025-05-07T19:42:58.3741316Z apicid : 17 2025-05-07T19:42:58.3741453Z initial apicid : 17 2025-05-07T19:42:58.3741530Z fpu : yes 2025-05-07T19:42:58.3741611Z fpu_exception : yes 2025-05-07T19:42:58.3741696Z cpuid level : 13 2025-05-07T19:42:58.3741770Z wp : yes 2025-05-07T19:42:58.3743977Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3744376Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3744457Z bogomips : 6000.01 2025-05-07T19:42:58.3744540Z clflush size : 64 2025-05-07T19:42:58.3744634Z cache_alignment : 64 2025-05-07T19:42:58.3744764Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3744847Z power management: 2025-05-07T19:42:58.3744856Z 2025-05-07T19:42:58.3744943Z processor : 57 2025-05-07T19:42:58.3745029Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3745111Z cpu family : 6 2025-05-07T19:42:58.3745189Z model : 85 2025-05-07T19:42:58.3745356Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3745433Z stepping : 7 2025-05-07T19:42:58.3745520Z microcode : 0x5003901 2025-05-07T19:42:58.3745602Z cpu MHz : 3252.791 2025-05-07T19:42:58.3745679Z cache size : 36608 KB 2025-05-07T19:42:58.3745758Z physical id : 0 2025-05-07T19:42:58.3745837Z siblings : 48 2025-05-07T19:42:58.3745908Z core id : 9 2025-05-07T19:42:58.3745982Z cpu cores : 24 2025-05-07T19:42:58.3746057Z apicid : 19 2025-05-07T19:42:58.3746145Z initial apicid : 19 2025-05-07T19:42:58.3746219Z fpu : yes 2025-05-07T19:42:58.3746300Z fpu_exception : yes 2025-05-07T19:42:58.3746385Z cpuid level : 13 2025-05-07T19:42:58.3746459Z wp : yes 2025-05-07T19:42:58.3748660Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3749058Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3749137Z bogomips : 6000.01 2025-05-07T19:42:58.3749216Z clflush size : 64 2025-05-07T19:42:58.3749306Z cache_alignment : 64 2025-05-07T19:42:58.3749433Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3749513Z power management: 2025-05-07T19:42:58.3749517Z 2025-05-07T19:42:58.3749754Z processor : 58 2025-05-07T19:42:58.3749841Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3749911Z cpu family : 6 2025-05-07T19:42:58.3749978Z model : 85 2025-05-07T19:42:58.3750127Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3750197Z stepping : 7 2025-05-07T19:42:58.3750270Z microcode : 0x5003901 2025-05-07T19:42:58.3750339Z cpu MHz : 3000.006 2025-05-07T19:42:58.3750419Z cache size : 36608 KB 2025-05-07T19:42:58.3750493Z physical id : 0 2025-05-07T19:42:58.3750562Z siblings : 48 2025-05-07T19:42:58.3750640Z core id : 10 2025-05-07T19:42:58.3750710Z cpu cores : 24 2025-05-07T19:42:58.3750779Z apicid : 21 2025-05-07T19:42:58.3750853Z initial apicid : 21 2025-05-07T19:42:58.3750929Z fpu : yes 2025-05-07T19:42:58.3751002Z fpu_exception : yes 2025-05-07T19:42:58.3751130Z cpuid level : 13 2025-05-07T19:42:58.3751199Z wp : yes 2025-05-07T19:42:58.3753237Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3753602Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3753682Z bogomips : 6000.01 2025-05-07T19:42:58.3753753Z clflush size : 64 2025-05-07T19:42:58.3753829Z cache_alignment : 64 2025-05-07T19:42:58.3753958Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3754032Z power management: 2025-05-07T19:42:58.3754036Z 2025-05-07T19:42:58.3754105Z processor : 59 2025-05-07T19:42:58.3754191Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3754268Z cpu family : 6 2025-05-07T19:42:58.3754335Z model : 85 2025-05-07T19:42:58.3754480Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3754561Z stepping : 7 2025-05-07T19:42:58.3754635Z microcode : 0x5003901 2025-05-07T19:42:58.3754705Z cpu MHz : 3791.136 2025-05-07T19:42:58.3754777Z cache size : 36608 KB 2025-05-07T19:42:58.3754855Z physical id : 0 2025-05-07T19:42:58.3754925Z siblings : 48 2025-05-07T19:42:58.3754994Z core id : 11 2025-05-07T19:42:58.3755073Z cpu cores : 24 2025-05-07T19:42:58.3755141Z apicid : 23 2025-05-07T19:42:58.3755213Z initial apicid : 23 2025-05-07T19:42:58.3755283Z fpu : yes 2025-05-07T19:42:58.3755363Z fpu_exception : yes 2025-05-07T19:42:58.3755435Z cpuid level : 13 2025-05-07T19:42:58.3755507Z wp : yes 2025-05-07T19:42:58.3757552Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3757913Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3757989Z bogomips : 6000.01 2025-05-07T19:42:58.3758067Z clflush size : 64 2025-05-07T19:42:58.3758141Z cache_alignment : 64 2025-05-07T19:42:58.3758260Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3758339Z power management: 2025-05-07T19:42:58.3758343Z 2025-05-07T19:42:58.3758413Z processor : 60 2025-05-07T19:42:58.3758490Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3758609Z cpu family : 6 2025-05-07T19:42:58.3758686Z model : 85 2025-05-07T19:42:58.3758832Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3758901Z stepping : 7 2025-05-07T19:42:58.3758984Z microcode : 0x5003901 2025-05-07T19:42:58.3759054Z cpu MHz : 3289.162 2025-05-07T19:42:58.3759127Z cache size : 36608 KB 2025-05-07T19:42:58.3759197Z physical id : 0 2025-05-07T19:42:58.3759277Z siblings : 48 2025-05-07T19:42:58.3759345Z core id : 12 2025-05-07T19:42:58.3759416Z cpu cores : 24 2025-05-07T19:42:58.3759485Z apicid : 25 2025-05-07T19:42:58.3759567Z initial apicid : 25 2025-05-07T19:42:58.3759633Z fpu : yes 2025-05-07T19:42:58.3759708Z fpu_exception : yes 2025-05-07T19:42:58.3759788Z cpuid level : 13 2025-05-07T19:42:58.3759856Z wp : yes 2025-05-07T19:42:58.3761948Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3762321Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3762391Z bogomips : 6000.01 2025-05-07T19:42:58.3762459Z clflush size : 64 2025-05-07T19:42:58.3762538Z cache_alignment : 64 2025-05-07T19:42:58.3762653Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3762734Z power management: 2025-05-07T19:42:58.3762738Z 2025-05-07T19:42:58.3762815Z processor : 61 2025-05-07T19:42:58.3762894Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3762963Z cpu family : 6 2025-05-07T19:42:58.3763029Z model : 85 2025-05-07T19:42:58.3763183Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3763254Z stepping : 7 2025-05-07T19:42:58.3763334Z microcode : 0x5003901 2025-05-07T19:42:58.3763416Z cpu MHz : 3222.307 2025-05-07T19:42:58.3763492Z cache size : 36608 KB 2025-05-07T19:42:58.3763566Z physical id : 0 2025-05-07T19:42:58.3763637Z siblings : 48 2025-05-07T19:42:58.3763712Z core id : 13 2025-05-07T19:42:58.3763784Z cpu cores : 24 2025-05-07T19:42:58.3763853Z apicid : 27 2025-05-07T19:42:58.3763932Z initial apicid : 27 2025-05-07T19:42:58.3764010Z fpu : yes 2025-05-07T19:42:58.3764088Z fpu_exception : yes 2025-05-07T19:42:58.3764160Z cpuid level : 13 2025-05-07T19:42:58.3764233Z wp : yes 2025-05-07T19:42:58.3766275Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3766634Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3766716Z bogomips : 6000.01 2025-05-07T19:42:58.3766789Z clflush size : 64 2025-05-07T19:42:58.3766865Z cache_alignment : 64 2025-05-07T19:42:58.3766989Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3767062Z power management: 2025-05-07T19:42:58.3767066Z 2025-05-07T19:42:58.3767139Z processor : 62 2025-05-07T19:42:58.3767217Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3767295Z cpu family : 6 2025-05-07T19:42:58.3767534Z model : 85 2025-05-07T19:42:58.3767849Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3767993Z stepping : 7 2025-05-07T19:42:58.3768077Z microcode : 0x5003901 2025-05-07T19:42:58.3768157Z cpu MHz : 3000.006 2025-05-07T19:42:58.3768238Z cache size : 36608 KB 2025-05-07T19:42:58.3768325Z physical id : 0 2025-05-07T19:42:58.3768403Z siblings : 48 2025-05-07T19:42:58.3768503Z core id : 14 2025-05-07T19:42:58.3768588Z cpu cores : 24 2025-05-07T19:42:58.3768662Z apicid : 29 2025-05-07T19:42:58.3768743Z initial apicid : 29 2025-05-07T19:42:58.3768817Z fpu : yes 2025-05-07T19:42:58.3768906Z fpu_exception : yes 2025-05-07T19:42:58.3768982Z cpuid level : 13 2025-05-07T19:42:58.3769057Z wp : yes 2025-05-07T19:42:58.3771359Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3771754Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3771843Z bogomips : 6000.01 2025-05-07T19:42:58.3771930Z clflush size : 64 2025-05-07T19:42:58.3772009Z cache_alignment : 64 2025-05-07T19:42:58.3772135Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3772224Z power management: 2025-05-07T19:42:58.3772229Z 2025-05-07T19:42:58.3772307Z processor : 63 2025-05-07T19:42:58.3772396Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3772474Z cpu family : 6 2025-05-07T19:42:58.3772556Z model : 85 2025-05-07T19:42:58.3772717Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3772798Z stepping : 7 2025-05-07T19:42:58.3772888Z microcode : 0x5003901 2025-05-07T19:42:58.3772964Z cpu MHz : 3000.006 2025-05-07T19:42:58.3773046Z cache size : 36608 KB 2025-05-07T19:42:58.3773127Z physical id : 0 2025-05-07T19:42:58.3773210Z siblings : 48 2025-05-07T19:42:58.3773284Z core id : 15 2025-05-07T19:42:58.3773363Z cpu cores : 24 2025-05-07T19:42:58.3773446Z apicid : 31 2025-05-07T19:42:58.3773525Z initial apicid : 31 2025-05-07T19:42:58.3773599Z fpu : yes 2025-05-07T19:42:58.3773681Z fpu_exception : yes 2025-05-07T19:42:58.3773762Z cpuid level : 13 2025-05-07T19:42:58.3773834Z wp : yes 2025-05-07T19:42:58.3776051Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3776453Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3776531Z bogomips : 6000.01 2025-05-07T19:42:58.3776608Z clflush size : 64 2025-05-07T19:42:58.3776696Z cache_alignment : 64 2025-05-07T19:42:58.3776821Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3776901Z power management: 2025-05-07T19:42:58.3776905Z 2025-05-07T19:42:58.3777039Z processor : 64 2025-05-07T19:42:58.3777133Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3777211Z cpu family : 6 2025-05-07T19:42:58.3777284Z model : 85 2025-05-07T19:42:58.3777450Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3777526Z stepping : 7 2025-05-07T19:42:58.3778514Z microcode : 0x5003901 2025-05-07T19:42:58.3778597Z cpu MHz : 3231.189 2025-05-07T19:42:58.3778678Z cache size : 36608 KB 2025-05-07T19:42:58.3778756Z physical id : 0 2025-05-07T19:42:58.3778835Z siblings : 48 2025-05-07T19:42:58.3778918Z core id : 16 2025-05-07T19:42:58.3778995Z cpu cores : 24 2025-05-07T19:42:58.3779070Z apicid : 33 2025-05-07T19:42:58.3779158Z initial apicid : 33 2025-05-07T19:42:58.3779230Z fpu : yes 2025-05-07T19:42:58.3779311Z fpu_exception : yes 2025-05-07T19:42:58.3779386Z cpuid level : 13 2025-05-07T19:42:58.3779463Z wp : yes 2025-05-07T19:42:58.3781760Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3782166Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3782246Z bogomips : 6000.01 2025-05-07T19:42:58.3782330Z clflush size : 64 2025-05-07T19:42:58.3782414Z cache_alignment : 64 2025-05-07T19:42:58.3782553Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3782635Z power management: 2025-05-07T19:42:58.3782639Z 2025-05-07T19:42:58.3782718Z processor : 65 2025-05-07T19:42:58.3782814Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3782891Z cpu family : 6 2025-05-07T19:42:58.3782970Z model : 85 2025-05-07T19:42:58.3783134Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3783219Z stepping : 7 2025-05-07T19:42:58.3783299Z microcode : 0x5003901 2025-05-07T19:42:58.3783383Z cpu MHz : 3000.006 2025-05-07T19:42:58.3783471Z cache size : 36608 KB 2025-05-07T19:42:58.3783550Z physical id : 0 2025-05-07T19:42:58.3783624Z siblings : 48 2025-05-07T19:42:58.3783699Z core id : 17 2025-05-07T19:42:58.3783780Z cpu cores : 24 2025-05-07T19:42:58.3783857Z apicid : 35 2025-05-07T19:42:58.3783938Z initial apicid : 35 2025-05-07T19:42:58.3784017Z fpu : yes 2025-05-07T19:42:58.3784098Z fpu_exception : yes 2025-05-07T19:42:58.3784177Z cpuid level : 13 2025-05-07T19:42:58.3784250Z wp : yes 2025-05-07T19:42:58.3786472Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3786865Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3786951Z bogomips : 6000.01 2025-05-07T19:42:58.3787028Z clflush size : 64 2025-05-07T19:42:58.3787108Z cache_alignment : 64 2025-05-07T19:42:58.3787234Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3787320Z power management: 2025-05-07T19:42:58.3787324Z 2025-05-07T19:42:58.3787399Z processor : 66 2025-05-07T19:42:58.3787484Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3787565Z cpu family : 6 2025-05-07T19:42:58.3787637Z model : 85 2025-05-07T19:42:58.3787796Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3787876Z stepping : 7 2025-05-07T19:42:58.3787965Z microcode : 0x5003901 2025-05-07T19:42:58.3788043Z cpu MHz : 3246.783 2025-05-07T19:42:58.3788173Z cache size : 36608 KB 2025-05-07T19:42:58.3788260Z physical id : 0 2025-05-07T19:42:58.3788333Z siblings : 48 2025-05-07T19:42:58.3788405Z core id : 18 2025-05-07T19:42:58.3788479Z cpu cores : 24 2025-05-07T19:42:58.3788559Z apicid : 37 2025-05-07T19:42:58.3788636Z initial apicid : 37 2025-05-07T19:42:58.3788708Z fpu : yes 2025-05-07T19:42:58.3788790Z fpu_exception : yes 2025-05-07T19:42:58.3788873Z cpuid level : 13 2025-05-07T19:42:58.3788947Z wp : yes 2025-05-07T19:42:58.3791169Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3791542Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3791617Z bogomips : 6000.01 2025-05-07T19:42:58.3791695Z clflush size : 64 2025-05-07T19:42:58.3791767Z cache_alignment : 64 2025-05-07T19:42:58.3791883Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3791959Z power management: 2025-05-07T19:42:58.3791962Z 2025-05-07T19:42:58.3792039Z processor : 67 2025-05-07T19:42:58.3792116Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3792185Z cpu family : 6 2025-05-07T19:42:58.3792260Z model : 85 2025-05-07T19:42:58.3792404Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3792477Z stepping : 7 2025-05-07T19:42:58.3792551Z microcode : 0x5003901 2025-05-07T19:42:58.3792625Z cpu MHz : 3270.235 2025-05-07T19:42:58.3792697Z cache size : 36608 KB 2025-05-07T19:42:58.3792772Z physical id : 0 2025-05-07T19:42:58.3792846Z siblings : 48 2025-05-07T19:42:58.3792912Z core id : 19 2025-05-07T19:42:58.3792980Z cpu cores : 24 2025-05-07T19:42:58.3793049Z apicid : 39 2025-05-07T19:42:58.3793125Z initial apicid : 39 2025-05-07T19:42:58.3793191Z fpu : yes 2025-05-07T19:42:58.3793264Z fpu_exception : yes 2025-05-07T19:42:58.3793333Z cpuid level : 13 2025-05-07T19:42:58.3793403Z wp : yes 2025-05-07T19:42:58.3795439Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3795804Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3795876Z bogomips : 6000.01 2025-05-07T19:42:58.3795949Z clflush size : 64 2025-05-07T19:42:58.3796024Z cache_alignment : 64 2025-05-07T19:42:58.3796144Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3796216Z power management: 2025-05-07T19:42:58.3796221Z 2025-05-07T19:42:58.3796291Z processor : 68 2025-05-07T19:42:58.3796375Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3796443Z cpu family : 6 2025-05-07T19:42:58.3796510Z model : 85 2025-05-07T19:42:58.3796659Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3796735Z stepping : 7 2025-05-07T19:42:58.3796820Z microcode : 0x5003901 2025-05-07T19:42:58.3796895Z cpu MHz : 3000.006 2025-05-07T19:42:58.3796991Z cache size : 36608 KB 2025-05-07T19:42:58.3797069Z physical id : 0 2025-05-07T19:42:58.3797145Z siblings : 48 2025-05-07T19:42:58.3797263Z core id : 20 2025-05-07T19:42:58.3797351Z cpu cores : 24 2025-05-07T19:42:58.3797426Z apicid : 41 2025-05-07T19:42:58.3797506Z initial apicid : 41 2025-05-07T19:42:58.3797592Z fpu : yes 2025-05-07T19:42:58.3797673Z fpu_exception : yes 2025-05-07T19:42:58.3797752Z cpuid level : 13 2025-05-07T19:42:58.3797825Z wp : yes 2025-05-07T19:42:58.3799929Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3800299Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3800394Z bogomips : 6000.01 2025-05-07T19:42:58.3800474Z clflush size : 64 2025-05-07T19:42:58.3800557Z cache_alignment : 64 2025-05-07T19:42:58.3800680Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3800774Z power management: 2025-05-07T19:42:58.3800778Z 2025-05-07T19:42:58.3800853Z processor : 69 2025-05-07T19:42:58.3800938Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3801028Z cpu family : 6 2025-05-07T19:42:58.3801101Z model : 85 2025-05-07T19:42:58.3801252Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3801328Z stepping : 7 2025-05-07T19:42:58.3801419Z microcode : 0x5003901 2025-05-07T19:42:58.3801498Z cpu MHz : 3000.006 2025-05-07T19:42:58.3801577Z cache size : 36608 KB 2025-05-07T19:42:58.3801665Z physical id : 0 2025-05-07T19:42:58.3801742Z siblings : 48 2025-05-07T19:42:58.3801817Z core id : 21 2025-05-07T19:42:58.3801896Z cpu cores : 24 2025-05-07T19:42:58.3801981Z apicid : 43 2025-05-07T19:42:58.3802061Z initial apicid : 43 2025-05-07T19:42:58.3802134Z fpu : yes 2025-05-07T19:42:58.3802225Z fpu_exception : yes 2025-05-07T19:42:58.3802303Z cpuid level : 13 2025-05-07T19:42:58.3802375Z wp : yes 2025-05-07T19:42:58.3804439Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3804804Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3804885Z bogomips : 6000.01 2025-05-07T19:42:58.3804977Z clflush size : 64 2025-05-07T19:42:58.3805059Z cache_alignment : 64 2025-05-07T19:42:58.3805183Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3805263Z power management: 2025-05-07T19:42:58.3805267Z 2025-05-07T19:42:58.3805358Z processor : 70 2025-05-07T19:42:58.3805444Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3805519Z cpu family : 6 2025-05-07T19:42:58.3805606Z model : 85 2025-05-07T19:42:58.3805756Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3805831Z stepping : 7 2025-05-07T19:42:58.3805913Z microcode : 0x5003901 2025-05-07T19:42:58.3806004Z cpu MHz : 3000.006 2025-05-07T19:42:58.3806084Z cache size : 36608 KB 2025-05-07T19:42:58.3806162Z physical id : 0 2025-05-07T19:42:58.3806251Z siblings : 48 2025-05-07T19:42:58.3806324Z core id : 22 2025-05-07T19:42:58.3806399Z cpu cores : 24 2025-05-07T19:42:58.3806563Z apicid : 45 2025-05-07T19:42:58.3806656Z initial apicid : 45 2025-05-07T19:42:58.3806731Z fpu : yes 2025-05-07T19:42:58.3806810Z fpu_exception : yes 2025-05-07T19:42:58.3806900Z cpuid level : 13 2025-05-07T19:42:58.3806973Z wp : yes 2025-05-07T19:42:58.3809060Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3809440Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3809523Z bogomips : 6000.01 2025-05-07T19:42:58.3809600Z clflush size : 64 2025-05-07T19:42:58.3809694Z cache_alignment : 64 2025-05-07T19:42:58.3809816Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3809895Z power management: 2025-05-07T19:42:58.3809899Z 2025-05-07T19:42:58.3809976Z processor : 71 2025-05-07T19:42:58.3810072Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3810148Z cpu family : 6 2025-05-07T19:42:58.3810221Z model : 85 2025-05-07T19:42:58.3810385Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3810462Z stepping : 7 2025-05-07T19:42:58.3810543Z microcode : 0x5003901 2025-05-07T19:42:58.3810619Z cpu MHz : 3000.006 2025-05-07T19:42:58.3810711Z cache size : 36608 KB 2025-05-07T19:42:58.3810792Z physical id : 0 2025-05-07T19:42:58.3810867Z siblings : 48 2025-05-07T19:42:58.3810956Z core id : 23 2025-05-07T19:42:58.3811033Z cpu cores : 24 2025-05-07T19:42:58.3811111Z apicid : 47 2025-05-07T19:42:58.3811194Z initial apicid : 47 2025-05-07T19:42:58.3811286Z fpu : yes 2025-05-07T19:42:58.3811369Z fpu_exception : yes 2025-05-07T19:42:58.3811445Z cpuid level : 13 2025-05-07T19:42:58.3811533Z wp : yes 2025-05-07T19:42:58.3813593Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3813959Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3814053Z bogomips : 6000.01 2025-05-07T19:42:58.3814130Z clflush size : 64 2025-05-07T19:42:58.3814209Z cache_alignment : 64 2025-05-07T19:42:58.3814344Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3814424Z power management: 2025-05-07T19:42:58.3814428Z 2025-05-07T19:42:58.3814505Z processor : 72 2025-05-07T19:42:58.3814591Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3814678Z cpu family : 6 2025-05-07T19:42:58.3814753Z model : 85 2025-05-07T19:42:58.3814903Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3814991Z stepping : 7 2025-05-07T19:42:58.3815071Z microcode : 0x5003901 2025-05-07T19:42:58.3815148Z cpu MHz : 3000.006 2025-05-07T19:42:58.3815235Z cache size : 36608 KB 2025-05-07T19:42:58.3815337Z physical id : 1 2025-05-07T19:42:58.3815411Z siblings : 48 2025-05-07T19:42:58.3815484Z core id : 0 2025-05-07T19:42:58.3815565Z cpu cores : 24 2025-05-07T19:42:58.3815634Z apicid : 65 2025-05-07T19:42:58.3815714Z initial apicid : 65 2025-05-07T19:42:58.3815837Z fpu : yes 2025-05-07T19:42:58.3815920Z fpu_exception : yes 2025-05-07T19:42:58.3815991Z cpuid level : 13 2025-05-07T19:42:58.3816062Z wp : yes 2025-05-07T19:42:58.3818433Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3818885Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3818967Z bogomips : 6000.01 2025-05-07T19:42:58.3819059Z clflush size : 64 2025-05-07T19:42:58.3819146Z cache_alignment : 64 2025-05-07T19:42:58.3819274Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3819373Z power management: 2025-05-07T19:42:58.3819377Z 2025-05-07T19:42:58.3819458Z processor : 73 2025-05-07T19:42:58.3819543Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3819619Z cpu family : 6 2025-05-07T19:42:58.3819708Z model : 85 2025-05-07T19:42:58.3819867Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3819944Z stepping : 7 2025-05-07T19:42:58.3820041Z microcode : 0x5003901 2025-05-07T19:42:58.3820115Z cpu MHz : 3000.006 2025-05-07T19:42:58.3820197Z cache size : 36608 KB 2025-05-07T19:42:58.3820274Z physical id : 1 2025-05-07T19:42:58.3820360Z siblings : 48 2025-05-07T19:42:58.3820437Z core id : 1 2025-05-07T19:42:58.3820514Z cpu cores : 24 2025-05-07T19:42:58.3820589Z apicid : 67 2025-05-07T19:42:58.3820682Z initial apicid : 67 2025-05-07T19:42:58.3820755Z fpu : yes 2025-05-07T19:42:58.3820843Z fpu_exception : yes 2025-05-07T19:42:58.3820937Z cpuid level : 13 2025-05-07T19:42:58.3821010Z wp : yes 2025-05-07T19:42:58.3823209Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3823620Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3823705Z bogomips : 6000.01 2025-05-07T19:42:58.3823785Z clflush size : 64 2025-05-07T19:42:58.3823877Z cache_alignment : 64 2025-05-07T19:42:58.3824012Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3824093Z power management: 2025-05-07T19:42:58.3824098Z 2025-05-07T19:42:58.3824184Z processor : 74 2025-05-07T19:42:58.3824273Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3824351Z cpu family : 6 2025-05-07T19:42:58.3824429Z model : 85 2025-05-07T19:42:58.3824605Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3824687Z stepping : 7 2025-05-07T19:42:58.3824767Z microcode : 0x5003901 2025-05-07T19:42:58.3824854Z cpu MHz : 3000.006 2025-05-07T19:42:58.3824941Z cache size : 36608 KB 2025-05-07T19:42:58.3825021Z physical id : 1 2025-05-07T19:42:58.3825097Z siblings : 48 2025-05-07T19:42:58.3825184Z core id : 2 2025-05-07T19:42:58.3825263Z cpu cores : 24 2025-05-07T19:42:58.3825336Z apicid : 69 2025-05-07T19:42:58.3825415Z initial apicid : 69 2025-05-07T19:42:58.3825504Z fpu : yes 2025-05-07T19:42:58.3825588Z fpu_exception : yes 2025-05-07T19:42:58.3825666Z cpuid level : 13 2025-05-07T19:42:58.3825819Z wp : yes 2025-05-07T19:42:58.3828016Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3828457Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3828552Z bogomips : 6000.01 2025-05-07T19:42:58.3828629Z clflush size : 64 2025-05-07T19:42:58.3828708Z cache_alignment : 64 2025-05-07T19:42:58.3828845Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3828927Z power management: 2025-05-07T19:42:58.3828931Z 2025-05-07T19:42:58.3829012Z processor : 75 2025-05-07T19:42:58.3829101Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3829190Z cpu family : 6 2025-05-07T19:42:58.3829264Z model : 85 2025-05-07T19:42:58.3829523Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3829604Z stepping : 7 2025-05-07T19:42:58.3829681Z microcode : 0x5003901 2025-05-07T19:42:58.3829756Z cpu MHz : 1797.256 2025-05-07T19:42:58.3829831Z cache size : 36608 KB 2025-05-07T19:42:58.3829915Z physical id : 1 2025-05-07T19:42:58.3829986Z siblings : 48 2025-05-07T19:42:58.3830057Z core id : 3 2025-05-07T19:42:58.3830140Z cpu cores : 24 2025-05-07T19:42:58.3830207Z apicid : 71 2025-05-07T19:42:58.3830286Z initial apicid : 71 2025-05-07T19:42:58.3830356Z fpu : yes 2025-05-07T19:42:58.3830439Z fpu_exception : yes 2025-05-07T19:42:58.3830511Z cpuid level : 13 2025-05-07T19:42:58.3830581Z wp : yes 2025-05-07T19:42:58.3832642Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3833001Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3833080Z bogomips : 6000.01 2025-05-07T19:42:58.3833166Z clflush size : 64 2025-05-07T19:42:58.3833241Z cache_alignment : 64 2025-05-07T19:42:58.3833367Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3833462Z power management: 2025-05-07T19:42:58.3833466Z 2025-05-07T19:42:58.3833536Z processor : 76 2025-05-07T19:42:58.3833623Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3833699Z cpu family : 6 2025-05-07T19:42:58.3833781Z model : 85 2025-05-07T19:42:58.3833929Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3834007Z stepping : 7 2025-05-07T19:42:58.3834098Z microcode : 0x5003901 2025-05-07T19:42:58.3834171Z cpu MHz : 1200.571 2025-05-07T19:42:58.3834246Z cache size : 36608 KB 2025-05-07T19:42:58.3834326Z physical id : 1 2025-05-07T19:42:58.3834404Z siblings : 48 2025-05-07T19:42:58.3834476Z core id : 4 2025-05-07T19:42:58.3834548Z cpu cores : 24 2025-05-07T19:42:58.3834634Z apicid : 73 2025-05-07T19:42:58.3834711Z initial apicid : 73 2025-05-07T19:42:58.3834787Z fpu : yes 2025-05-07T19:42:58.3834864Z fpu_exception : yes 2025-05-07T19:42:58.3834946Z cpuid level : 13 2025-05-07T19:42:58.3835013Z wp : yes 2025-05-07T19:42:58.3837049Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3837464Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3837541Z bogomips : 6000.01 2025-05-07T19:42:58.3837660Z clflush size : 64 2025-05-07T19:42:58.3837744Z cache_alignment : 64 2025-05-07T19:42:58.3837866Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3837943Z power management: 2025-05-07T19:42:58.3837951Z 2025-05-07T19:42:58.3838035Z processor : 77 2025-05-07T19:42:58.3838116Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3838188Z cpu family : 6 2025-05-07T19:42:58.3838256Z model : 85 2025-05-07T19:42:58.3838413Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3838487Z stepping : 7 2025-05-07T19:42:58.3838562Z microcode : 0x5003901 2025-05-07T19:42:58.3838639Z cpu MHz : 1200.616 2025-05-07T19:42:58.3838720Z cache size : 36608 KB 2025-05-07T19:42:58.3838795Z physical id : 1 2025-05-07T19:42:58.3838863Z siblings : 48 2025-05-07T19:42:58.3838948Z core id : 5 2025-05-07T19:42:58.3839017Z cpu cores : 24 2025-05-07T19:42:58.3839089Z apicid : 75 2025-05-07T19:42:58.3839172Z initial apicid : 75 2025-05-07T19:42:58.3839243Z fpu : yes 2025-05-07T19:42:58.3839325Z fpu_exception : yes 2025-05-07T19:42:58.3839400Z cpuid level : 13 2025-05-07T19:42:58.3839480Z wp : yes 2025-05-07T19:42:58.3841518Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3841891Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3841965Z bogomips : 6000.01 2025-05-07T19:42:58.3842037Z clflush size : 64 2025-05-07T19:42:58.3842115Z cache_alignment : 64 2025-05-07T19:42:58.3842251Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3842330Z power management: 2025-05-07T19:42:58.3842334Z 2025-05-07T19:42:58.3842404Z processor : 78 2025-05-07T19:42:58.3842504Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3842581Z cpu family : 6 2025-05-07T19:42:58.3842649Z model : 85 2025-05-07T19:42:58.3842798Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3842880Z stepping : 7 2025-05-07T19:42:58.3842956Z microcode : 0x5003901 2025-05-07T19:42:58.3843031Z cpu MHz : 1200.278 2025-05-07T19:42:58.3843114Z cache size : 36608 KB 2025-05-07T19:42:58.3843185Z physical id : 1 2025-05-07T19:42:58.3843267Z siblings : 48 2025-05-07T19:42:58.3843338Z core id : 6 2025-05-07T19:42:58.3843417Z cpu cores : 24 2025-05-07T19:42:58.3843492Z apicid : 77 2025-05-07T19:42:58.3843567Z initial apicid : 77 2025-05-07T19:42:58.3843646Z fpu : yes 2025-05-07T19:42:58.3843727Z fpu_exception : yes 2025-05-07T19:42:58.3843801Z cpuid level : 13 2025-05-07T19:42:58.3843869Z wp : yes 2025-05-07T19:42:58.3845915Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3846322Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3846413Z bogomips : 6000.01 2025-05-07T19:42:58.3846487Z clflush size : 64 2025-05-07T19:42:58.3846566Z cache_alignment : 64 2025-05-07T19:42:58.3846729Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3846815Z power management: 2025-05-07T19:42:58.3846819Z 2025-05-07T19:42:58.3846890Z processor : 79 2025-05-07T19:42:58.3846976Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3847055Z cpu family : 6 2025-05-07T19:42:58.3847124Z model : 85 2025-05-07T19:42:58.3847268Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3847343Z stepping : 7 2025-05-07T19:42:58.3847425Z microcode : 0x5003901 2025-05-07T19:42:58.3847494Z cpu MHz : 3000.006 2025-05-07T19:42:58.3847569Z cache size : 36608 KB 2025-05-07T19:42:58.3847651Z physical id : 1 2025-05-07T19:42:58.3847719Z siblings : 48 2025-05-07T19:42:58.3847790Z core id : 7 2025-05-07T19:42:58.3847862Z cpu cores : 24 2025-05-07T19:42:58.3847945Z apicid : 79 2025-05-07T19:42:58.3848017Z initial apicid : 79 2025-05-07T19:42:58.3848084Z fpu : yes 2025-05-07T19:42:58.3848156Z fpu_exception : yes 2025-05-07T19:42:58.3848231Z cpuid level : 13 2025-05-07T19:42:58.3848302Z wp : yes 2025-05-07T19:42:58.3850330Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3850702Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3850773Z bogomips : 6000.01 2025-05-07T19:42:58.3850844Z clflush size : 64 2025-05-07T19:42:58.3850927Z cache_alignment : 64 2025-05-07T19:42:58.3851042Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3851118Z power management: 2025-05-07T19:42:58.3851121Z 2025-05-07T19:42:58.3851203Z processor : 80 2025-05-07T19:42:58.3851289Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3851362Z cpu family : 6 2025-05-07T19:42:58.3851441Z model : 85 2025-05-07T19:42:58.3851592Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3851663Z stepping : 7 2025-05-07T19:42:58.3851740Z microcode : 0x5003901 2025-05-07T19:42:58.3851826Z cpu MHz : 3000.006 2025-05-07T19:42:58.3851902Z cache size : 36608 KB 2025-05-07T19:42:58.3851974Z physical id : 1 2025-05-07T19:42:58.3852047Z siblings : 48 2025-05-07T19:42:58.3852123Z core id : 8 2025-05-07T19:42:58.3852196Z cpu cores : 24 2025-05-07T19:42:58.3852264Z apicid : 81 2025-05-07T19:42:58.3852348Z initial apicid : 81 2025-05-07T19:42:58.3852422Z fpu : yes 2025-05-07T19:42:58.3852497Z fpu_exception : yes 2025-05-07T19:42:58.3852567Z cpuid level : 13 2025-05-07T19:42:58.3852643Z wp : yes 2025-05-07T19:42:58.3854692Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3855110Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3855186Z bogomips : 6000.01 2025-05-07T19:42:58.3855258Z clflush size : 64 2025-05-07T19:42:58.3855334Z cache_alignment : 64 2025-05-07T19:42:58.3855465Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3855538Z power management: 2025-05-07T19:42:58.3855588Z 2025-05-07T19:42:58.3855661Z processor : 81 2025-05-07T19:42:58.3855749Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3855819Z cpu family : 6 2025-05-07T19:42:58.3855886Z model : 85 2025-05-07T19:42:58.3856031Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3856105Z stepping : 7 2025-05-07T19:42:58.3856177Z microcode : 0x5003901 2025-05-07T19:42:58.3856249Z cpu MHz : 3000.006 2025-05-07T19:42:58.3856325Z cache size : 36608 KB 2025-05-07T19:42:58.3856400Z physical id : 1 2025-05-07T19:42:58.3856468Z siblings : 48 2025-05-07T19:42:58.3856535Z core id : 9 2025-05-07T19:42:58.3856609Z cpu cores : 24 2025-05-07T19:42:58.3856676Z apicid : 83 2025-05-07T19:42:58.3856749Z initial apicid : 83 2025-05-07T19:42:58.3856820Z fpu : yes 2025-05-07T19:42:58.3856895Z fpu_exception : yes 2025-05-07T19:42:58.3856965Z cpuid level : 13 2025-05-07T19:42:58.3857098Z wp : yes 2025-05-07T19:42:58.3859457Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3859850Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3859942Z bogomips : 6000.01 2025-05-07T19:42:58.3860019Z clflush size : 64 2025-05-07T19:42:58.3860097Z cache_alignment : 64 2025-05-07T19:42:58.3860221Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3860305Z power management: 2025-05-07T19:42:58.3860310Z 2025-05-07T19:42:58.3860389Z processor : 82 2025-05-07T19:42:58.3860474Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3860558Z cpu family : 6 2025-05-07T19:42:58.3860629Z model : 85 2025-05-07T19:42:58.3860786Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3860863Z stepping : 7 2025-05-07T19:42:58.3860954Z microcode : 0x5003901 2025-05-07T19:42:58.3861034Z cpu MHz : 1199.378 2025-05-07T19:42:58.3861115Z cache size : 36608 KB 2025-05-07T19:42:58.3861204Z physical id : 1 2025-05-07T19:42:58.3861281Z siblings : 48 2025-05-07T19:42:58.3861356Z core id : 10 2025-05-07T19:42:58.3861429Z cpu cores : 24 2025-05-07T19:42:58.3861519Z apicid : 85 2025-05-07T19:42:58.3861599Z initial apicid : 85 2025-05-07T19:42:58.3861672Z fpu : yes 2025-05-07T19:42:58.3861759Z fpu_exception : yes 2025-05-07T19:42:58.3861839Z cpuid level : 13 2025-05-07T19:42:58.3861916Z wp : yes 2025-05-07T19:42:58.3864139Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3864596Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3864679Z bogomips : 6000.01 2025-05-07T19:42:58.3864772Z clflush size : 64 2025-05-07T19:42:58.3864857Z cache_alignment : 64 2025-05-07T19:42:58.3864990Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3865070Z power management: 2025-05-07T19:42:58.3865075Z 2025-05-07T19:42:58.3865160Z processor : 83 2025-05-07T19:42:58.3865298Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3865379Z cpu family : 6 2025-05-07T19:42:58.3865460Z model : 85 2025-05-07T19:42:58.3865615Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3865690Z stepping : 7 2025-05-07T19:42:58.3865773Z microcode : 0x5003901 2025-05-07T19:42:58.3865855Z cpu MHz : 3000.006 2025-05-07T19:42:58.3865932Z cache size : 36608 KB 2025-05-07T19:42:58.3866006Z physical id : 1 2025-05-07T19:42:58.3866086Z siblings : 48 2025-05-07T19:42:58.3866161Z core id : 11 2025-05-07T19:42:58.3866236Z cpu cores : 24 2025-05-07T19:42:58.3866308Z apicid : 87 2025-05-07T19:42:58.3866393Z initial apicid : 87 2025-05-07T19:42:58.3866466Z fpu : yes 2025-05-07T19:42:58.3866546Z fpu_exception : yes 2025-05-07T19:42:58.3866629Z cpuid level : 13 2025-05-07T19:42:58.3866702Z wp : yes 2025-05-07T19:42:58.3869074Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3869478Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3869558Z bogomips : 6000.01 2025-05-07T19:42:58.3869637Z clflush size : 64 2025-05-07T19:42:58.3869722Z cache_alignment : 64 2025-05-07T19:42:58.3869847Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3869926Z power management: 2025-05-07T19:42:58.3869931Z 2025-05-07T19:42:58.3870007Z processor : 84 2025-05-07T19:42:58.3870098Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3870172Z cpu family : 6 2025-05-07T19:42:58.3870249Z model : 85 2025-05-07T19:42:58.3870411Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3870486Z stepping : 7 2025-05-07T19:42:58.3870567Z microcode : 0x5003901 2025-05-07T19:42:58.3870646Z cpu MHz : 1199.362 2025-05-07T19:42:58.3870729Z cache size : 36608 KB 2025-05-07T19:42:58.3870812Z physical id : 1 2025-05-07T19:42:58.3870887Z siblings : 48 2025-05-07T19:42:58.3870967Z core id : 12 2025-05-07T19:42:58.3871043Z cpu cores : 24 2025-05-07T19:42:58.3871120Z apicid : 89 2025-05-07T19:42:58.3871203Z initial apicid : 89 2025-05-07T19:42:58.3871286Z fpu : yes 2025-05-07T19:42:58.3871370Z fpu_exception : yes 2025-05-07T19:42:58.3871448Z cpuid level : 13 2025-05-07T19:42:58.3871517Z wp : yes 2025-05-07T19:42:58.3873749Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3874228Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3874321Z bogomips : 6000.01 2025-05-07T19:42:58.3874400Z clflush size : 64 2025-05-07T19:42:58.3874480Z cache_alignment : 64 2025-05-07T19:42:58.3874619Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3874702Z power management: 2025-05-07T19:42:58.3874707Z 2025-05-07T19:42:58.3874785Z processor : 85 2025-05-07T19:42:58.3874870Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3874957Z cpu family : 6 2025-05-07T19:42:58.3875029Z model : 85 2025-05-07T19:42:58.3875251Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3875343Z stepping : 7 2025-05-07T19:42:58.3875426Z microcode : 0x5003901 2025-05-07T19:42:58.3875505Z cpu MHz : 1200.044 2025-05-07T19:42:58.3875590Z cache size : 36608 KB 2025-05-07T19:42:58.3875678Z physical id : 1 2025-05-07T19:42:58.3875760Z siblings : 48 2025-05-07T19:42:58.3875839Z core id : 13 2025-05-07T19:42:58.3875928Z cpu cores : 24 2025-05-07T19:42:58.3876001Z apicid : 91 2025-05-07T19:42:58.3876083Z initial apicid : 91 2025-05-07T19:42:58.3876159Z fpu : yes 2025-05-07T19:42:58.3876251Z fpu_exception : yes 2025-05-07T19:42:58.3876329Z cpuid level : 13 2025-05-07T19:42:58.3876406Z wp : yes 2025-05-07T19:42:58.3878640Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3879035Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3879114Z bogomips : 6000.01 2025-05-07T19:42:58.3879316Z clflush size : 64 2025-05-07T19:42:58.3879391Z cache_alignment : 64 2025-05-07T19:42:58.3879512Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3879601Z power management: 2025-05-07T19:42:58.3879605Z 2025-05-07T19:42:58.3879678Z processor : 86 2025-05-07T19:42:58.3879765Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3879835Z cpu family : 6 2025-05-07T19:42:58.3879917Z model : 85 2025-05-07T19:42:58.3880068Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3880142Z stepping : 7 2025-05-07T19:42:58.3880230Z microcode : 0x5003901 2025-05-07T19:42:58.3880307Z cpu MHz : 1200.226 2025-05-07T19:42:58.3880380Z cache size : 36608 KB 2025-05-07T19:42:58.3880456Z physical id : 1 2025-05-07T19:42:58.3880537Z siblings : 48 2025-05-07T19:42:58.3880607Z core id : 14 2025-05-07T19:42:58.3880677Z cpu cores : 24 2025-05-07T19:42:58.3880744Z apicid : 93 2025-05-07T19:42:58.3880824Z initial apicid : 93 2025-05-07T19:42:58.3880891Z fpu : yes 2025-05-07T19:42:58.3880967Z fpu_exception : yes 2025-05-07T19:42:58.3881046Z cpuid level : 13 2025-05-07T19:42:58.3881121Z wp : yes 2025-05-07T19:42:58.3883160Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3883582Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3883658Z bogomips : 6000.01 2025-05-07T19:42:58.3883732Z clflush size : 64 2025-05-07T19:42:58.3883826Z cache_alignment : 64 2025-05-07T19:42:58.3883949Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3884025Z power management: 2025-05-07T19:42:58.3884029Z 2025-05-07T19:42:58.3884111Z processor : 87 2025-05-07T19:42:58.3884189Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3884265Z cpu family : 6 2025-05-07T19:42:58.3884335Z model : 85 2025-05-07T19:42:58.3884484Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3884602Z stepping : 7 2025-05-07T19:42:58.3884681Z microcode : 0x5003901 2025-05-07T19:42:58.3884754Z cpu MHz : 3000.006 2025-05-07T19:42:58.3884835Z cache size : 36608 KB 2025-05-07T19:42:58.3884913Z physical id : 1 2025-05-07T19:42:58.3884991Z siblings : 48 2025-05-07T19:42:58.3885068Z core id : 15 2025-05-07T19:42:58.3885140Z cpu cores : 24 2025-05-07T19:42:58.3885218Z apicid : 95 2025-05-07T19:42:58.3885294Z initial apicid : 95 2025-05-07T19:42:58.3885375Z fpu : yes 2025-05-07T19:42:58.3885455Z fpu_exception : yes 2025-05-07T19:42:58.3885531Z cpuid level : 13 2025-05-07T19:42:58.3885612Z wp : yes 2025-05-07T19:42:58.3887645Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3888012Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3888107Z bogomips : 6000.01 2025-05-07T19:42:58.3888185Z clflush size : 64 2025-05-07T19:42:58.3888262Z cache_alignment : 64 2025-05-07T19:42:58.3888394Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3888475Z power management: 2025-05-07T19:42:58.3888479Z 2025-05-07T19:42:58.3888551Z processor : 88 2025-05-07T19:42:58.3888633Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3888734Z cpu family : 6 2025-05-07T19:42:58.3888813Z model : 85 2025-05-07T19:42:58.3888960Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3889049Z stepping : 7 2025-05-07T19:42:58.3889132Z microcode : 0x5003901 2025-05-07T19:42:58.3889209Z cpu MHz : 1199.952 2025-05-07T19:42:58.3889286Z cache size : 36608 KB 2025-05-07T19:42:58.3889378Z physical id : 1 2025-05-07T19:42:58.3889450Z siblings : 48 2025-05-07T19:42:58.3889522Z core id : 16 2025-05-07T19:42:58.3889610Z cpu cores : 24 2025-05-07T19:42:58.3889680Z apicid : 97 2025-05-07T19:42:58.3889758Z initial apicid : 97 2025-05-07T19:42:58.3889828Z fpu : yes 2025-05-07T19:42:58.3889916Z fpu_exception : yes 2025-05-07T19:42:58.3889993Z cpuid level : 13 2025-05-07T19:42:58.3890067Z wp : yes 2025-05-07T19:42:58.3892114Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3892481Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3892613Z bogomips : 6000.01 2025-05-07T19:42:58.3892692Z clflush size : 64 2025-05-07T19:42:58.3892767Z cache_alignment : 64 2025-05-07T19:42:58.3892887Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3892976Z power management: 2025-05-07T19:42:58.3892980Z 2025-05-07T19:42:58.3893060Z processor : 89 2025-05-07T19:42:58.3893141Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3893212Z cpu family : 6 2025-05-07T19:42:58.3893282Z model : 85 2025-05-07T19:42:58.3893428Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3893498Z stepping : 7 2025-05-07T19:42:58.3893594Z microcode : 0x5003901 2025-05-07T19:42:58.3893945Z cpu MHz : 1199.437 2025-05-07T19:42:58.3894023Z cache size : 36608 KB 2025-05-07T19:42:58.3894096Z physical id : 1 2025-05-07T19:42:58.3894178Z siblings : 48 2025-05-07T19:42:58.3894244Z core id : 17 2025-05-07T19:42:58.3894318Z cpu cores : 24 2025-05-07T19:42:58.3894405Z apicid : 99 2025-05-07T19:42:58.3894479Z initial apicid : 99 2025-05-07T19:42:58.3894548Z fpu : yes 2025-05-07T19:42:58.3894624Z fpu_exception : yes 2025-05-07T19:42:58.3894709Z cpuid level : 13 2025-05-07T19:42:58.3894780Z wp : yes 2025-05-07T19:42:58.3896831Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3897275Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3897357Z bogomips : 6000.01 2025-05-07T19:42:58.3897598Z clflush size : 64 2025-05-07T19:42:58.3897692Z cache_alignment : 64 2025-05-07T19:42:58.3897820Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3897906Z power management: 2025-05-07T19:42:58.3897910Z 2025-05-07T19:42:58.3898001Z processor : 90 2025-05-07T19:42:58.3898087Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3898162Z cpu family : 6 2025-05-07T19:42:58.3898236Z model : 85 2025-05-07T19:42:58.3898455Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3898530Z stepping : 7 2025-05-07T19:42:58.3898612Z microcode : 0x5003901 2025-05-07T19:42:58.3898709Z cpu MHz : 1200.606 2025-05-07T19:42:58.3898794Z cache size : 36608 KB 2025-05-07T19:42:58.3898872Z physical id : 1 2025-05-07T19:42:58.3898949Z siblings : 48 2025-05-07T19:42:58.3899045Z core id : 18 2025-05-07T19:42:58.3899120Z cpu cores : 24 2025-05-07T19:42:58.3899200Z apicid : 101 2025-05-07T19:42:58.3899305Z initial apicid : 101 2025-05-07T19:42:58.3899383Z fpu : yes 2025-05-07T19:42:58.3899462Z fpu_exception : yes 2025-05-07T19:42:58.3899535Z cpuid level : 13 2025-05-07T19:42:58.3899619Z wp : yes 2025-05-07T19:42:58.3901830Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3902227Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3902365Z bogomips : 6000.01 2025-05-07T19:42:58.3902442Z clflush size : 64 2025-05-07T19:42:58.3902521Z cache_alignment : 64 2025-05-07T19:42:58.3902654Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3902734Z power management: 2025-05-07T19:42:58.3902738Z 2025-05-07T19:42:58.3902814Z processor : 91 2025-05-07T19:42:58.3902902Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3902975Z cpu family : 6 2025-05-07T19:42:58.3903049Z model : 85 2025-05-07T19:42:58.3903202Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3903285Z stepping : 7 2025-05-07T19:42:58.3903370Z microcode : 0x5003901 2025-05-07T19:42:58.3903451Z cpu MHz : 1201.000 2025-05-07T19:42:58.3903549Z cache size : 36608 KB 2025-05-07T19:42:58.3903685Z physical id : 1 2025-05-07T19:42:58.3903771Z siblings : 48 2025-05-07T19:42:58.3903855Z core id : 19 2025-05-07T19:42:58.3903946Z cpu cores : 24 2025-05-07T19:42:58.3904023Z apicid : 103 2025-05-07T19:42:58.3904109Z initial apicid : 103 2025-05-07T19:42:58.3904204Z fpu : yes 2025-05-07T19:42:58.3904285Z fpu_exception : yes 2025-05-07T19:42:58.3904372Z cpuid level : 13 2025-05-07T19:42:58.3904448Z wp : yes 2025-05-07T19:42:58.3906673Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3907067Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3907156Z bogomips : 6000.01 2025-05-07T19:42:58.3907240Z clflush size : 64 2025-05-07T19:42:58.3907322Z cache_alignment : 64 2025-05-07T19:42:58.3907452Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3907541Z power management: 2025-05-07T19:42:58.3907545Z 2025-05-07T19:42:58.3907629Z processor : 92 2025-05-07T19:42:58.3907717Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3907805Z cpu family : 6 2025-05-07T19:42:58.3907877Z model : 85 2025-05-07T19:42:58.3908038Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3908119Z stepping : 7 2025-05-07T19:42:58.3908215Z microcode : 0x5003901 2025-05-07T19:42:58.3908292Z cpu MHz : 1200.914 2025-05-07T19:42:58.3908379Z cache size : 36608 KB 2025-05-07T19:42:58.3908474Z physical id : 1 2025-05-07T19:42:58.3908550Z siblings : 48 2025-05-07T19:42:58.3908629Z core id : 20 2025-05-07T19:42:58.3908708Z cpu cores : 24 2025-05-07T19:42:58.3908799Z apicid : 105 2025-05-07T19:42:58.3908883Z initial apicid : 105 2025-05-07T19:42:58.3908956Z fpu : yes 2025-05-07T19:42:58.3909046Z fpu_exception : yes 2025-05-07T19:42:58.3909135Z cpuid level : 13 2025-05-07T19:42:58.3909216Z wp : yes 2025-05-07T19:42:58.3911406Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3911780Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3911853Z bogomips : 6000.01 2025-05-07T19:42:58.3911936Z clflush size : 64 2025-05-07T19:42:58.3912057Z cache_alignment : 64 2025-05-07T19:42:58.3912174Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3912252Z power management: 2025-05-07T19:42:58.3912256Z 2025-05-07T19:42:58.3912344Z processor : 93 2025-05-07T19:42:58.3912430Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3912501Z cpu family : 6 2025-05-07T19:42:58.3912582Z model : 85 2025-05-07T19:42:58.3912733Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3912810Z stepping : 7 2025-05-07T19:42:58.3912888Z microcode : 0x5003901 2025-05-07T19:42:58.3912974Z cpu MHz : 1199.795 2025-05-07T19:42:58.3913053Z cache size : 36608 KB 2025-05-07T19:42:58.3913129Z physical id : 1 2025-05-07T19:42:58.3913209Z siblings : 48 2025-05-07T19:42:58.3913282Z core id : 21 2025-05-07T19:42:58.3913404Z cpu cores : 24 2025-05-07T19:42:58.3913477Z apicid : 107 2025-05-07T19:42:58.3913574Z initial apicid : 107 2025-05-07T19:42:58.3913643Z fpu : yes 2025-05-07T19:42:58.3913718Z fpu_exception : yes 2025-05-07T19:42:58.3913792Z cpuid level : 13 2025-05-07T19:42:58.3913881Z wp : yes 2025-05-07T19:42:58.3915920Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3916298Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3916377Z bogomips : 6000.01 2025-05-07T19:42:58.3916458Z clflush size : 64 2025-05-07T19:42:58.3916537Z cache_alignment : 64 2025-05-07T19:42:58.3916670Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3916749Z power management: 2025-05-07T19:42:58.3916753Z 2025-05-07T19:42:58.3916829Z processor : 94 2025-05-07T19:42:58.3916920Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3916996Z cpu family : 6 2025-05-07T19:42:58.3917064Z model : 85 2025-05-07T19:42:58.3917224Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3917296Z stepping : 7 2025-05-07T19:42:58.3917376Z microcode : 0x5003901 2025-05-07T19:42:58.3917445Z cpu MHz : 1199.587 2025-05-07T19:42:58.3917531Z cache size : 36608 KB 2025-05-07T19:42:58.3917609Z physical id : 1 2025-05-07T19:42:58.3917681Z siblings : 48 2025-05-07T19:42:58.3917748Z core id : 22 2025-05-07T19:42:58.3917832Z cpu cores : 24 2025-05-07T19:42:58.3917903Z apicid : 109 2025-05-07T19:42:58.3917992Z initial apicid : 109 2025-05-07T19:42:58.3918070Z fpu : yes 2025-05-07T19:42:58.3918153Z fpu_exception : yes 2025-05-07T19:42:58.3918226Z cpuid level : 13 2025-05-07T19:42:58.3918300Z wp : yes 2025-05-07T19:42:58.3920357Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3920720Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3920807Z bogomips : 6000.01 2025-05-07T19:42:58.3920880Z clflush size : 64 2025-05-07T19:42:58.3920957Z cache_alignment : 64 2025-05-07T19:42:58.3921072Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3921204Z power management: 2025-05-07T19:42:58.3921208Z 2025-05-07T19:42:58.3921280Z processor : 95 2025-05-07T19:42:58.3921364Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3921446Z cpu family : 6 2025-05-07T19:42:58.3921523Z model : 85 2025-05-07T19:42:58.3921668Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3921743Z stepping : 7 2025-05-07T19:42:58.3921835Z microcode : 0x5003901 2025-05-07T19:42:58.3921913Z cpu MHz : 3000.006 2025-05-07T19:42:58.3921987Z cache size : 36608 KB 2025-05-07T19:42:58.3922076Z physical id : 1 2025-05-07T19:42:58.3922151Z siblings : 48 2025-05-07T19:42:58.3922222Z core id : 23 2025-05-07T19:42:58.3922295Z cpu cores : 24 2025-05-07T19:42:58.3922388Z apicid : 111 2025-05-07T19:42:58.3922520Z initial apicid : 111 2025-05-07T19:42:58.3922589Z fpu : yes 2025-05-07T19:42:58.3922676Z fpu_exception : yes 2025-05-07T19:42:58.3922753Z cpuid level : 13 2025-05-07T19:42:58.3922828Z wp : yes 2025-05-07T19:42:58.3924905Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3925268Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3925353Z bogomips : 6000.01 2025-05-07T19:42:58.3925445Z clflush size : 64 2025-05-07T19:42:58.3925530Z cache_alignment : 64 2025-05-07T19:42:58.3925655Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3925740Z power management: 2025-05-07T19:42:58.3925744Z 2025-05-07T19:42:58.3925757Z 2025-05-07T19:42:58.3925868Z ################################################################################ 2025-05-07T19:42:58.3925959Z [INFO] Print PCI info ... 2025-05-07T19:42:58.3926033Z + lspci -v 2025-05-07T19:42:58.3926037Z 2025-05-07T19:42:58.3926223Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] 2025-05-07T19:42:58.3926332Z Subsystem: Amazon.com, Inc. Device 1237 2025-05-07T19:42:58.3926443Z Flags: bus master, medium devsel, latency 0 2025-05-07T19:42:58.3926448Z 2025-05-07T19:42:58.3926656Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 2025-05-07T19:42:58.3926739Z Physical Slot: 1 2025-05-07T19:42:58.3926851Z Flags: bus master, fast devsel, latency 0 2025-05-07T19:42:58.3926856Z 2025-05-07T19:42:58.3927111Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08) 2025-05-07T19:42:58.3927198Z Physical Slot: 1 2025-05-07T19:42:58.3927334Z Flags: bus master, fast devsel, latency 0, IRQ 9 2025-05-07T19:42:58.3927339Z 2025-05-07T19:42:58.3927613Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111 (prog-if 00 [VGA controller]) 2025-05-07T19:42:58.3927692Z Physical Slot: 3 2025-05-07T19:42:58.3927795Z Flags: bus master, fast devsel, latency 0 2025-05-07T19:42:58.3927928Z Memory at c0000000 (32-bit, prefetchable) [size=4M] 2025-05-07T19:42:58.3928070Z Expansion ROM at 000c0000 [disabled] [size=128K] 2025-05-07T19:42:58.3928074Z 2025-05-07T19:42:58.3928377Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller (prog-if 02 [NVM Express]) 2025-05-07T19:42:58.3928483Z Subsystem: Amazon.com, Inc. Device 0000 2025-05-07T19:42:58.3928573Z Physical Slot: 4 2025-05-07T19:42:58.3928692Z Flags: bus master, fast devsel, latency 0, IRQ 11 2025-05-07T19:42:58.3928839Z Memory at c0514000 (32-bit, non-prefetchable) [size=16K] 2025-05-07T19:42:58.3928952Z Capabilities: 2025-05-07T19:42:58.3929088Z Kernel driver in use: nvme 2025-05-07T19:42:58.3929093Z 2025-05-07T19:42:58.3929297Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA) 2025-05-07T19:42:58.3929379Z Physical Slot: 5 2025-05-07T19:42:58.3929495Z Flags: bus master, fast devsel, latency 0 2025-05-07T19:42:58.3929644Z Memory at c0510000 (32-bit, non-prefetchable) [size=16K] 2025-05-07T19:42:58.3929766Z Memory at c0400000 (32-bit, prefetchable) [size=1M] 2025-05-07T19:42:58.3929930Z Memory at c0500000 (32-bit, non-prefetchable) [size=64K] 2025-05-07T19:42:58.3930022Z Capabilities: 2025-05-07T19:42:58.3930116Z Kernel driver in use: ena 2025-05-07T19:42:58.3930120Z 2025-05-07T19:42:58.3930124Z 2025-05-07T19:42:58.3930343Z ################################################################################ 2025-05-07T19:42:58.3930451Z [INFO] Print Linux distribution info ... 2025-05-07T19:42:58.3930528Z + uname -a 2025-05-07T19:42:58.3930532Z 2025-05-07T19:42:58.3930920Z Linux 684dd9e27594 6.1.130-139.222.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Mar 11 01:10:58 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux 2025-05-07T19:42:58.3930928Z 2025-05-07T19:42:58.3931009Z + uname -m 2025-05-07T19:42:58.3931013Z 2025-05-07T19:42:58.3931085Z x86_64 2025-05-07T19:42:58.3931090Z 2025-05-07T19:42:58.3931173Z + cat /proc/version 2025-05-07T19:42:58.3931177Z 2025-05-07T19:42:58.3931740Z Linux version 6.1.130-139.222.amzn2023.x86_64 (mockbuild@ip-10-0-55-76) (gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5), GNU ld version 2.39-6.amzn2023.0.11) #1 SMP PREEMPT_DYNAMIC Tue Mar 11 01:10:58 UTC 2025 2025-05-07T19:42:58.3931744Z 2025-05-07T19:42:58.3931822Z + cat /etc/os-release 2025-05-07T19:42:58.3931825Z 2025-05-07T19:42:58.3931930Z NAME="Amazon Linux" 2025-05-07T19:42:58.3932012Z VERSION="2023" 2025-05-07T19:42:58.3932088Z ID="amzn" 2025-05-07T19:42:58.3932163Z ID_LIKE="fedora" 2025-05-07T19:42:58.3932257Z VERSION_ID="2023" 2025-05-07T19:42:58.3932354Z PLATFORM_ID="platform:al2023" 2025-05-07T19:42:58.3932461Z PRETTY_NAME="Amazon Linux 2023.7.20250428" 2025-05-07T19:42:58.3932551Z ANSI_COLOR="0;33" 2025-05-07T19:42:58.3932666Z CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2023" 2025-05-07T19:42:58.3932836Z HOME_URL="https://aws.amazon.com/linux/amazon-linux-2023/" 2025-05-07T19:42:58.3932998Z DOCUMENTATION_URL="https://docs.aws.amazon.com/linux/" 2025-05-07T19:42:58.3933162Z SUPPORT_URL="https://aws.amazon.com/premiumsupport/" 2025-05-07T19:42:58.3933347Z BUG_REPORT_URL="https://github.com/amazonlinux/amazon-linux-2023" 2025-05-07T19:42:58.3933424Z VENDOR_NAME="AWS" 2025-05-07T19:42:58.3933535Z VENDOR_URL="https://aws.amazon.com/" 2025-05-07T19:42:58.3933622Z SUPPORT_END="2029-06-30" 2025-05-07T19:42:58.3933625Z 2025-05-07T19:42:58.3968923Z ##[group]Run . $PRELUDE; print_gpu_info 2025-05-07T19:42:58.3969077Z . $PRELUDE; print_gpu_info 2025-05-07T19:42:58.3969357Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:42:58.3969435Z env: 2025-05-07T19:42:58.3969550Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:42:58.3969655Z BUILD_ENV: build_binary 2025-05-07T19:42:58.3969769Z BUILD_TARGET: genai 2025-05-07T19:42:58.3969851Z BUILD_VARIANT: cuda 2025-05-07T19:42:58.3969946Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:42:58.3970054Z ##[endgroup] 2025-05-07T19:42:58.7743930Z ################################################################################ 2025-05-07T19:42:58.7744400Z [INFO] Printing general display info ... 2025-05-07T19:42:58.7762101Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:42:58.8693545Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:42:58.8700179Z /usr/bin/sudo 2025-05-07T19:42:58.8711137Z which: no apt-get in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) 2025-05-07T19:42:58.8723154Z /usr/bin/yum 2025-05-07T19:42:58.8725474Z [INSTALL] Updating system repositories ... 2025-05-07T19:42:58.8749569Z [EXEC] [ATTEMPT 0/3] + sudo yum update -y 2025-05-07T19:42:59.0918411Z Last metadata expiration check: 0:00:18 ago on Wed May 7 19:42:41 2025. 2025-05-07T19:42:59.1879893Z Dependencies resolved. 2025-05-07T19:42:59.2092382Z Nothing to do. 2025-05-07T19:42:59.2093373Z Complete! 2025-05-07T19:42:59.2769661Z [INSTALL] Installing system package(s): hostname lshw ... 2025-05-07T19:42:59.2797327Z [EXEC] [ATTEMPT 0/3] + sudo yum install -y hostname lshw 2025-05-07T19:42:59.5016622Z Last metadata expiration check: 0:00:18 ago on Wed May 7 19:42:41 2025. 2025-05-07T19:42:59.5531614Z Dependencies resolved. 2025-05-07T19:42:59.5698039Z ================================================================================ 2025-05-07T19:42:59.5699217Z Package Arch Version Repository Size 2025-05-07T19:42:59.5699731Z ================================================================================ 2025-05-07T19:42:59.5700068Z Installing: 2025-05-07T19:42:59.5700415Z hostname x86_64 3.23-4.amzn2023.0.3 amazonlinux 28 k 2025-05-07T19:42:59.5700928Z lshw x86_64 B.02.19.2-7.amzn2023.0.3 amazonlinux 319 k 2025-05-07T19:42:59.5701244Z 2025-05-07T19:42:59.5701341Z Transaction Summary 2025-05-07T19:42:59.5701728Z ================================================================================ 2025-05-07T19:42:59.5702164Z Install 2 Packages 2025-05-07T19:42:59.5702311Z 2025-05-07T19:42:59.5702408Z Total download size: 347 k 2025-05-07T19:42:59.5702664Z Installed size: 883 k 2025-05-07T19:42:59.5702908Z Downloading Packages: 2025-05-07T19:42:59.8699362Z (1/2): hostname-3.23-4.amzn2023.0.3.x86_64.rpm 1.6 MB/s | 28 kB 00:00 2025-05-07T19:42:59.8817728Z (2/2): lshw-B.02.19.2-7.amzn2023.0.3.x86_64.rpm 11 MB/s | 319 kB 00:00 2025-05-07T19:42:59.8823566Z -------------------------------------------------------------------------------- 2025-05-07T19:42:59.8827506Z Total 1.1 MB/s | 347 kB 00:00 2025-05-07T19:42:59.9064588Z Running transaction check 2025-05-07T19:42:59.9123553Z Transaction check succeeded. 2025-05-07T19:42:59.9124535Z Running transaction test 2025-05-07T19:42:59.9288054Z Transaction test succeeded. 2025-05-07T19:42:59.9288439Z Running transaction 2025-05-07T19:42:59.9556605Z Preparing : 1/1 2025-05-07T19:42:59.9626206Z Installing : lshw-B.02.19.2-7.amzn2023.0.3.x86_64 1/2 2025-05-07T19:42:59.9660468Z Installing : hostname-3.23-4.amzn2023.0.3.x86_64 2/2 2025-05-07T19:43:01.0089828Z Running scriptlet: hostname-3.23-4.amzn2023.0.3.x86_64 2/2 2025-05-07T19:43:01.0090819Z Verifying : hostname-3.23-4.amzn2023.0.3.x86_64 1/2 2025-05-07T19:43:01.0463143Z Verifying : lshw-B.02.19.2-7.amzn2023.0.3.x86_64 2/2 2025-05-07T19:43:01.0463512Z 2025-05-07T19:43:01.0464419Z Installed: 2025-05-07T19:43:01.0464924Z hostname-3.23-4.amzn2023.0.3.x86_64 lshw-B.02.19.2-7.amzn2023.0.3.x86_64 2025-05-07T19:43:01.0465446Z 2025-05-07T19:43:01.0465549Z Complete! 2025-05-07T19:43:01.0878263Z + hostname 2025-05-07T19:43:01.0878627Z 2025-05-07T19:43:01.0890062Z 684dd9e27594 2025-05-07T19:43:01.0890383Z 2025-05-07T19:43:01.0890593Z + sudo lshw -C display 2025-05-07T19:43:01.0890811Z 2025-05-07T19:43:01.2888092Z *-display UNCLAIMED 2025-05-07T19:43:01.2888585Z description: VGA compatible controller 2025-05-07T19:43:01.2888965Z product: Amazon.com, Inc. 2025-05-07T19:43:01.2889306Z vendor: Amazon.com, Inc. 2025-05-07T19:43:01.2889601Z physical id: 3 2025-05-07T19:43:01.2889895Z bus info: pci@0000:00:03.0 2025-05-07T19:43:01.2890221Z version: 00 2025-05-07T19:43:01.2890488Z width: 32 bits 2025-05-07T19:43:01.2890764Z clock: 33MHz 2025-05-07T19:43:01.2891040Z capabilities: vga_controller bus_master 2025-05-07T19:43:01.2891417Z configuration: latency=0 2025-05-07T19:43:01.2891777Z resources: memory:c0000000-c03fffff memory:c0000-dffff 2025-05-07T19:43:01.2905565Z 2025-05-07T19:43:01.2905741Z ################################################################################ 2025-05-07T19:43:01.2906213Z [INFO] Printing NVIDIA GPU info ... 2025-05-07T19:43:01.3016119Z lspci: Unable to load libkmod resources: error -2 2025-05-07T19:43:01.3037244Z which: no nvidia-smi in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) 2025-05-07T19:43:01.3038713Z [CHECK] nvidia-smi not found 2025-05-07T19:43:01.3039580Z ################################################################################ 2025-05-07T19:43:01.3040571Z [INFO] Printing AMD GPU info ... 2025-05-07T19:43:01.3170644Z lspci: Unable to load libkmod resources: error -2 2025-05-07T19:43:01.3198343Z which: no rocminfo in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) 2025-05-07T19:43:01.3198911Z [CHECK] rocminfo not found 2025-05-07T19:43:01.3205815Z which: no rocm-smi in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) 2025-05-07T19:43:01.3206338Z [CHECK] rocm-smi not found 2025-05-07T19:43:01.3307001Z ##[group]Run . $PRELUDE; setup_miniconda $HOME/miniconda 2025-05-07T19:43:01.3307511Z . $PRELUDE; setup_miniconda $HOME/miniconda 2025-05-07T19:43:01.3308079Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:43:01.3308435Z env: 2025-05-07T19:43:01.3308696Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:43:01.3309008Z BUILD_ENV: build_binary 2025-05-07T19:43:01.3309290Z BUILD_TARGET: genai 2025-05-07T19:43:01.3309531Z BUILD_VARIANT: cuda 2025-05-07T19:43:01.3309800Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:43:01.3310057Z ##[endgroup] 2025-05-07T19:43:01.7943112Z ################################################################################ 2025-05-07T19:43:01.7944220Z # Setup Miniconda 2025-05-07T19:43:01.7944847Z # 2025-05-07T19:43:01.7960495Z # [2025-05-07T19:43:01.795Z] + setup_miniconda /github/home/miniconda 2025-05-07T19:43:01.7961820Z ################################################################################ 2025-05-07T19:43:01.7962695Z 2025-05-07T19:43:01.7973220Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:43:01.8876907Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:43:01.8878044Z + mkdir -p /github/home/miniconda 2025-05-07T19:43:01.8878629Z 2025-05-07T19:43:01.8891736Z 2025-05-07T19:43:01.8892635Z [SETUP] Downloading the Miniconda installer ... 2025-05-07T19:43:01.8912544Z [EXEC] [ATTEMPT 0/3] + wget -q https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh 2025-05-07T19:43:03.4165991Z [SETUP] Installing Miniconda ... 2025-05-07T19:43:03.4167184Z + bash miniconda.sh -b -p /github/home/miniconda -u 2025-05-07T19:43:03.4168373Z 2025-05-07T19:43:03.4314785Z PREFIX=/github/home/miniconda 2025-05-07T19:43:03.7844873Z Unpacking payload ... 2025-05-07T19:43:04.2620988Z entry_point.py:256: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior. 2025-05-07T19:43:04.9307797Z entry_point.py:256: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior. 2025-05-07T19:43:06.7755675Z 2025-05-07T19:43:06.7756165Z Installing base environment... 2025-05-07T19:43:07.7613455Z 2025-05-07T19:43:07.7614009Z Preparing transaction: ...working... done 2025-05-07T19:43:10.5961696Z Executing transaction: ...working... done 2025-05-07T19:43:11.1407700Z entry_point.py:256: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior. 2025-05-07T19:43:11.2097301Z installation finished. 2025-05-07T19:43:11.2105253Z 2025-05-07T19:43:11.2106195Z + rm -f miniconda.sh 2025-05-07T19:43:11.2106735Z 2025-05-07T19:43:11.2260080Z 2025-05-07T19:43:11.2260326Z [SETUP] Reloading the bash configuration ... 2025-05-07T19:43:11.2261167Z + /github/home/miniconda/bin/conda init bash 2025-05-07T19:43:11.2261420Z 2025-05-07T19:43:11.5955069Z no change /github/home/miniconda/condabin/conda 2025-05-07T19:43:11.5955500Z no change /github/home/miniconda/bin/conda 2025-05-07T19:43:11.5955932Z no change /github/home/miniconda/bin/conda-env 2025-05-07T19:43:11.5956355Z no change /github/home/miniconda/bin/activate 2025-05-07T19:43:11.5956742Z no change /github/home/miniconda/bin/deactivate 2025-05-07T19:43:11.5957194Z no change /github/home/miniconda/etc/profile.d/conda.sh 2025-05-07T19:43:11.5957670Z no change /github/home/miniconda/etc/fish/conf.d/conda.fish 2025-05-07T19:43:11.5958172Z no change /github/home/miniconda/shell/condabin/Conda.psm1 2025-05-07T19:43:11.5958668Z no change /github/home/miniconda/shell/condabin/conda-hook.ps1 2025-05-07T19:43:11.5959403Z no change /github/home/miniconda/lib/python3.13/site-packages/xontrib/conda.xsh 2025-05-07T19:43:11.5960305Z no change /github/home/miniconda/etc/profile.d/conda.csh 2025-05-07T19:43:11.5960761Z modified /github/home/.bashrc 2025-05-07T19:43:11.5960964Z 2025-05-07T19:43:11.5961219Z ==> For changes to take effect, close and re-open your current shell. <== 2025-05-07T19:43:11.5961653Z 2025-05-07T19:43:11.6497441Z 2025-05-07T19:43:11.6498135Z + . /github/home/.bashrc 2025-05-07T19:43:11.6498707Z 2025-05-07T19:43:12.4348549Z 2025-05-07T19:43:12.4349563Z [SETUP] Installing libmamba-solver (required since Anaconda 2024.02-1) and libarchive ... 2025-05-07T19:43:12.4372651Z [EXEC] [ATTEMPT 0/3] + conda install --solver=classic -c conda-forge --override-channels -y conda-libmamba-solver libmamba libmambapy libarchive 2025-05-07T19:43:24.2021998Z Collecting package metadata (current_repodata.json): - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - done 2025-05-07T19:43:25.6590022Z Solving environment: | / - \ | / - \ | / - done 2025-05-07T19:43:25.7494468Z 2025-05-07T19:43:25.7495043Z ## Package Plan ## 2025-05-07T19:43:25.7495606Z 2025-05-07T19:43:25.7496043Z environment location: /github/home/miniconda 2025-05-07T19:43:25.7496774Z 2025-05-07T19:43:25.7497311Z added / updated specs: 2025-05-07T19:43:25.7498146Z - conda-libmamba-solver 2025-05-07T19:43:25.7498888Z - libarchive 2025-05-07T19:43:25.7499516Z - libmamba 2025-05-07T19:43:25.7500094Z - libmambapy 2025-05-07T19:43:25.7500490Z 2025-05-07T19:43:25.7500502Z 2025-05-07T19:43:25.7500878Z The following packages will be downloaded: 2025-05-07T19:43:25.7501293Z 2025-05-07T19:43:25.7501450Z package | build 2025-05-07T19:43:25.7501814Z ---------------------------|----------------- 2025-05-07T19:43:25.7502332Z ca-certificates-2025.4.26 | hbd8a1cb_0 149 KB conda-forge 2025-05-07T19:43:25.7502869Z certifi-2025.4.26 | pyhd8ed1ab_0 154 KB conda-forge 2025-05-07T19:43:25.7503380Z conda-25.3.1 | py313h78bf25f_1 1.1 MB conda-forge 2025-05-07T19:43:25.7503929Z conda-libmamba-solver-25.4.0| pyhd8ed1ab_0 41 KB conda-forge 2025-05-07T19:43:25.7504435Z ------------------------------------------------------------ 2025-05-07T19:43:25.7504841Z Total: 1.4 MB 2025-05-07T19:43:25.7505078Z 2025-05-07T19:43:25.7505212Z The following packages will be UPDATED: 2025-05-07T19:43:25.7505472Z 2025-05-07T19:43:25.7510446Z ca-certificates pkgs/main/linux-64::ca-certificates-2~ --> conda-forge/noarch::ca-certificates-2025.4.26-hbd8a1cb_0 2025-05-07T19:43:25.7511470Z conda pkgs/main::conda-25.3.1-py313h06a4308~ --> conda-forge::conda-25.3.1-py313h78bf25f_1 2025-05-07T19:43:25.7512173Z 2025-05-07T19:43:25.7512423Z The following packages will be SUPERSEDED by a higher-priority channel: 2025-05-07T19:43:25.7512802Z 2025-05-07T19:43:25.7513160Z certifi pkgs/main/linux-64::certifi-2025.4.26~ --> conda-forge/noarch::certifi-2025.4.26-pyhd8ed1ab_0 2025-05-07T19:43:25.7514082Z conda-libmamba-so~ pkgs/main::conda-libmamba-solver-25.4~ --> conda-forge::conda-libmamba-solver-25.4.0-pyhd8ed1ab_0 2025-05-07T19:43:25.7514619Z 2025-05-07T19:43:25.7514623Z 2025-05-07T19:43:25.7514627Z 2025-05-07T19:43:25.7514791Z Downloading and Extracting Packages: ...working... 2025-05-07T19:43:25.7515221Z conda-25.3.1 | 1.1 MB | | 0% 2025-05-07T19:43:25.7515495Z 2025-05-07T19:43:25.7515851Z certifi-2025.4.26 | 154 KB | | 0%  2025-05-07T19:43:25.7516237Z 2025-05-07T19:43:25.7516240Z 2025-05-07T19:43:25.7516481Z ca-certificates-2025 | 149 KB | | 0%  2025-05-07T19:43:25.7516756Z 2025-05-07T19:43:25.7516792Z 2025-05-07T19:43:25.7516948Z 2025-05-07T19:43:25.8130291Z conda-libmamba-solve | 41 KB | | 0%  2025-05-07T19:43:25.8131258Z 2025-05-07T19:43:25.8188451Z certifi-2025.4.26 | 154 KB | ########## | 100%  2025-05-07T19:43:25.8189298Z 2025-05-07T19:43:25.8189312Z 2025-05-07T19:43:25.8189322Z 2025-05-07T19:43:25.8294392Z conda-libmamba-solve | 41 KB | ########## | 100%  2025-05-07T19:43:25.8294773Z 2025-05-07T19:43:25.8316352Z certifi-2025.4.26 | 154 KB | ########## | 100%  2025-05-07T19:43:25.8317090Z 2025-05-07T19:43:25.8317151Z 2025-05-07T19:43:25.8488842Z ca-certificates-2025 | 149 KB | ########## | 100%  2025-05-07T19:43:25.8489410Z 2025-05-07T19:43:25.8489429Z 2025-05-07T19:43:25.8489477Z 2025-05-07T19:43:25.8497529Z conda-libmamba-solve | 41 KB | ########## | 100%  2025-05-07T19:43:25.8498470Z 2025-05-07T19:43:25.8498482Z 2025-05-07T19:43:25.8522752Z ca-certificates-2025 | 149 KB | ########## | 100%  2025-05-07T19:43:25.8523263Z conda-25.3.1 | 1.1 MB | ########## | 100% 2025-05-07T19:43:25.9497055Z conda-25.3.1 | 1.1 MB | ########## | 100% 2025-05-07T19:43:25.9503421Z conda-25.3.1 | 1.1 MB | ########## | 100% 2025-05-07T19:43:25.9504463Z 2025-05-07T19:43:25.9505084Z 2025-05-07T19:43:25.9505754Z  2025-05-07T19:43:25.9506384Z 2025-05-07T19:43:25.9506396Z 2025-05-07T19:43:25.9506834Z  2025-05-07T19:43:25.9507097Z 2025-05-07T19:43:25.9507101Z 2025-05-07T19:43:25.9507104Z 2025-05-07T19:43:25.9507294Z  done 2025-05-07T19:43:26.0514625Z Preparing transaction: | done 2025-05-07T19:43:26.1523324Z Verifying transaction: - done 2025-05-07T19:43:27.4550371Z Executing transaction: | / - \ | / - \ | / - \ | done 2025-05-07T19:43:29.0344346Z [SETUP] Updating Miniconda base packages ... 2025-05-07T19:43:29.0370019Z [EXEC] [ATTEMPT 0/3] + conda update -n base -c defaults --update-deps -y conda 2025-05-07T19:43:29.7555184Z Channels: 2025-05-07T19:43:29.7555529Z - defaults 2025-05-07T19:43:29.7555772Z Platform: linux-64 2025-05-07T19:43:30.8072602Z Collecting package metadata (repodata.json): - \ | / - \ done 2025-05-07T19:43:30.9373558Z Solving environment: / - Channels: 2025-05-07T19:43:30.9373974Z - defaults 2025-05-07T19:43:30.9374241Z Platform: linux-64 2025-05-07T19:43:31.2197488Z Collecting package metadata (repodata.json): | / - \ done 2025-05-07T19:43:31.4278531Z Solving environment: / - \ done 2025-05-07T19:43:31.5486399Z | done 2025-05-07T19:43:31.6121204Z 2025-05-07T19:43:31.6121633Z ## Package Plan ## 2025-05-07T19:43:31.6121844Z 2025-05-07T19:43:31.6122035Z environment location: /github/home/miniconda 2025-05-07T19:43:31.6122301Z 2025-05-07T19:43:31.6122412Z added / updated specs: 2025-05-07T19:43:31.6123067Z - conda 2025-05-07T19:43:31.6123201Z 2025-05-07T19:43:31.6123205Z 2025-05-07T19:43:31.6123344Z The following packages will be downloaded: 2025-05-07T19:43:31.6123596Z 2025-05-07T19:43:31.6123725Z package | build 2025-05-07T19:43:31.6124114Z ---------------------------|----------------- 2025-05-07T19:43:31.6124491Z pip-25.1 | pyhc872135_2 1.3 MB 2025-05-07T19:43:31.6124911Z tzdata-2025b | h04d1e81_0 116 KB 2025-05-07T19:43:31.6125315Z ------------------------------------------------------------ 2025-05-07T19:43:31.6125697Z Total: 1.4 MB 2025-05-07T19:43:31.6125925Z 2025-05-07T19:43:31.6126045Z The following packages will be UPDATED: 2025-05-07T19:43:31.6126300Z 2025-05-07T19:43:31.6126647Z pip pkgs/main/linux-64::pip-25.0-py313h06~ --> pkgs/main/noarch::pip-25.1-pyhc872135_2 2025-05-07T19:43:31.6127363Z tzdata 2025a-h04d1e81_0 --> 2025b-h04d1e81_0 2025-05-07T19:43:31.6127650Z 2025-05-07T19:43:31.6127653Z 2025-05-07T19:43:31.6127657Z 2025-05-07T19:43:31.6127820Z Downloading and Extracting Packages: ...working... 2025-05-07T19:43:31.6128258Z pip-25.1 | 1.3 MB | | 0% 2025-05-07T19:43:31.6128509Z 2025-05-07T19:43:31.6501751Z tzdata-2025b | 116 KB | | 0%  2025-05-07T19:43:31.6502579Z 2025-05-07T19:43:31.6809999Z tzdata-2025b | 116 KB | ########## | 100%  2025-05-07T19:43:31.9142010Z pip-25.1 | 1.3 MB | ########## | 100% 2025-05-07T19:43:31.9142295Z 2025-05-07T19:43:31.9142934Z tzdata-2025b | 116 KB | ########## | 100%  2025-05-07T19:43:31.9143208Z 2025-05-07T19:43:31.9216615Z tzdata-2025b | 116 KB | ########## | 100%  2025-05-07T19:43:31.9217153Z pip-25.1 | 1.3 MB | ########## | 100% 2025-05-07T19:43:31.9218184Z pip-25.1 | 1.3 MB | ########## | 100% 2025-05-07T19:43:31.9218802Z 2025-05-07T19:43:31.9219022Z 2025-05-07T19:43:31.9219468Z  done 2025-05-07T19:43:32.0228173Z Preparing transaction: - done 2025-05-07T19:43:32.1242230Z Verifying transaction: | done 2025-05-07T19:43:34.1289165Z Executing transaction: - \ | / - \ | / - \ | / - \ | / - \ | / done 2025-05-07T19:43:34.6703693Z [SETUP] Cleaning up Conda packages ... 2025-05-07T19:43:34.6704731Z + conda clean --packages --tarball -y 2025-05-07T19:43:34.6704998Z 2025-05-07T19:43:35.1082086Z Will remove 99 (117.8 MB) tarball(s). 2025-05-07T19:43:35.1082453Z Will remove 11 (16.0 MB) package(s). 2025-05-07T19:43:35.1626676Z 2025-05-07T19:43:35.1631189Z + conda clean --all -y 2025-05-07T19:43:35.1631725Z 2025-05-07T19:43:35.6075634Z There are no unused tarball(s) to remove. 2025-05-07T19:43:35.6076217Z Will remove 1 index cache(s). 2025-05-07T19:43:35.6076637Z There are no unused package(s) to remove. 2025-05-07T19:43:35.6076982Z There are no tempfile(s) to remove. 2025-05-07T19:43:35.6077325Z There are no logfile(s) to remove. 2025-05-07T19:43:35.6615106Z 2025-05-07T19:43:35.6615534Z + conda info 2025-05-07T19:43:35.6615729Z 2025-05-07T19:43:36.2247839Z 2025-05-07T19:43:36.2248437Z active environment : base 2025-05-07T19:43:36.2249395Z active env location : /github/home/miniconda 2025-05-07T19:43:36.2250442Z shell level : 1 2025-05-07T19:43:36.2251285Z user config file : /github/home/.condarc 2025-05-07T19:43:36.2252437Z populated config files : /github/home/miniconda/.condarc 2025-05-07T19:43:36.2253680Z conda version : 25.3.1 2025-05-07T19:43:36.2253995Z conda-build version : not installed 2025-05-07T19:43:36.2254354Z python version : 3.13.2.final.0 2025-05-07T19:43:36.2254685Z solver : libmamba (default) 2025-05-07T19:43:36.2255215Z virtual packages : __archspec=1=cascadelake 2025-05-07T19:43:36.2255889Z __conda=25.3.1=0 2025-05-07T19:43:36.2256205Z __glibc=2.34=0 2025-05-07T19:43:36.2256537Z __linux=6.1.130=0 2025-05-07T19:43:36.2256833Z __unix=0=0 2025-05-07T19:43:36.2257341Z base environment : /github/home/miniconda (writable) 2025-05-07T19:43:36.2257953Z conda av data dir : /github/home/miniconda/etc/conda 2025-05-07T19:43:36.2258368Z conda av metadata url : None 2025-05-07T19:43:36.2258775Z channel URLs : https://repo.anaconda.com/pkgs/main/linux-64 2025-05-07T19:43:36.2259270Z https://repo.anaconda.com/pkgs/main/noarch 2025-05-07T19:43:36.2259726Z https://repo.anaconda.com/pkgs/r/linux-64 2025-05-07T19:43:36.2260143Z https://repo.anaconda.com/pkgs/r/noarch 2025-05-07T19:43:36.2260569Z package cache : /github/home/miniconda/pkgs 2025-05-07T19:43:36.2261087Z /github/home/.conda/pkgs 2025-05-07T19:43:36.2261491Z envs directories : /github/home/miniconda/envs 2025-05-07T19:43:36.2261862Z /github/home/.conda/envs 2025-05-07T19:43:36.2262231Z platform : linux-64 2025-05-07T19:43:36.2263198Z user-agent : conda/25.3.1 requests/2.32.3 CPython/3.13.2 Linux/6.1.130-139.222.amzn2023.x86_64 amzn/2023.7.20250428 glibc/2.34 solver/libmamba conda-libmamba-solver/25.4.0 libmambapy/2.0.5 aau/0.7.0 c/. s/. e/. 2025-05-07T19:43:36.2264231Z UID:GID : 0:0 2025-05-07T19:43:36.2264526Z netrc file : None 2025-05-07T19:43:36.2264795Z offline mode : False 2025-05-07T19:43:36.2265001Z 2025-05-07T19:43:36.2820770Z 2025-05-07T19:43:36.2821472Z [SETUP] Exporting Miniconda variables ... 2025-05-07T19:43:36.2822689Z [SETUP] Saving Miniconda variables to /__w/_temp/_runner_file_commands/add_path_a3f15865-0412-48ef-8381-c7af8347a277 ... 2025-05-07T19:43:36.2823447Z [SETUP] Successfully set up Miniconda at /github/home/miniconda 2025-05-07T19:43:36.2969928Z ##[group]Run . $PRELUDE; create_conda_environment $BUILD_ENV 3.9 2025-05-07T19:43:36.2970537Z . $PRELUDE; create_conda_environment $BUILD_ENV 3.9 2025-05-07T19:43:36.2971436Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:43:36.2971809Z env: 2025-05-07T19:43:36.2972059Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:43:36.2972425Z BUILD_ENV: build_binary 2025-05-07T19:43:36.2972700Z BUILD_TARGET: genai 2025-05-07T19:43:36.2972979Z BUILD_VARIANT: cuda 2025-05-07T19:43:36.2973236Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:43:36.2973541Z ##[endgroup] 2025-05-07T19:43:36.7471121Z ################################################################################ 2025-05-07T19:43:36.7471518Z # Create Conda Environment 2025-05-07T19:43:36.7471766Z # 2025-05-07T19:43:36.7486544Z # [2025-05-07T19:43:36.748Z] + create_conda_environment build_binary 3.9 2025-05-07T19:43:36.7487175Z ################################################################################ 2025-05-07T19:43:36.7487422Z 2025-05-07T19:43:36.7506556Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:43:36.8344318Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:43:36.8344782Z [SETUP] Listing existing Conda environments ... 2025-05-07T19:43:36.8345123Z + conda info --envs 2025-05-07T19:43:36.8345283Z 2025-05-07T19:43:37.4103433Z 2025-05-07T19:43:37.4103758Z # conda environments: 2025-05-07T19:43:37.4104076Z # 2025-05-07T19:43:37.4104299Z base /github/home/miniconda 2025-05-07T19:43:37.4689072Z 2025-05-07T19:43:37.4689089Z 2025-05-07T19:43:37.4689726Z [SETUP] Deleting the prefix directory if it exists ... 2025-05-07T19:43:39.1100214Z + rm -rf /github/home/miniconda/envs/build_binary 2025-05-07T19:43:39.1100505Z 2025-05-07T19:43:39.1115092Z 2025-05-07T19:43:39.1121138Z [SETUP] Creating new Conda environment (Python 3.9) ... 2025-05-07T19:43:39.1143237Z [EXEC] [ATTEMPT 0/3] + conda create -y -n build_binary python=3.9 2025-05-07T19:43:39.6893408Z Channels: 2025-05-07T19:43:39.6893795Z - defaults 2025-05-07T19:43:39.6894056Z Platform: linux-64 2025-05-07T19:43:41.0909026Z Collecting package metadata (repodata.json): - \ | / - \ | / - done 2025-05-07T19:43:41.1915356Z Solving environment: | done 2025-05-07T19:43:41.2205330Z 2025-05-07T19:43:41.2206034Z ## Package Plan ## 2025-05-07T19:43:41.2206543Z 2025-05-07T19:43:41.2207145Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:43:41.2208086Z 2025-05-07T19:43:41.2208362Z added / updated specs: 2025-05-07T19:43:41.2209087Z - python=3.9 2025-05-07T19:43:41.2209499Z 2025-05-07T19:43:41.2209510Z 2025-05-07T19:43:41.2209863Z The following packages will be downloaded: 2025-05-07T19:43:41.2210530Z 2025-05-07T19:43:41.2210885Z package | build 2025-05-07T19:43:41.2211832Z ---------------------------|----------------- 2025-05-07T19:43:41.2213029Z _libgcc_mutex-0.1 | main 3 KB 2025-05-07T19:43:41.2214237Z _openmp_mutex-5.1 | 1_gnu 21 KB 2025-05-07T19:43:41.2215014Z ca-certificates-2025.2.25 | h06a4308_0 129 KB 2025-05-07T19:43:41.2215445Z python-3.9.21 | he870216_1 25.1 MB 2025-05-07T19:43:41.2215925Z setuptools-78.1.1 | py39h06a4308_0 1.7 MB 2025-05-07T19:43:41.2216365Z wheel-0.45.1 | py39h06a4308_0 114 KB 2025-05-07T19:43:41.2216748Z ------------------------------------------------------------ 2025-05-07T19:43:41.2217263Z Total: 27.1 MB 2025-05-07T19:43:41.2217489Z 2025-05-07T19:43:41.2217625Z The following NEW packages will be INSTALLED: 2025-05-07T19:43:41.2217886Z 2025-05-07T19:43:41.2218116Z _libgcc_mutex pkgs/main/linux-64::_libgcc_mutex-0.1-main 2025-05-07T19:43:41.2218611Z _openmp_mutex pkgs/main/linux-64::_openmp_mutex-5.1-1_gnu 2025-05-07T19:43:41.2219519Z ca-certificates pkgs/main/linux-64::ca-certificates-2025.2.25-h06a4308_0 2025-05-07T19:43:41.2220118Z ld_impl_linux-64 pkgs/main/linux-64::ld_impl_linux-64-2.40-h12ee557_0 2025-05-07T19:43:41.2220607Z libffi pkgs/main/linux-64::libffi-3.4.4-h6a678d5_1 2025-05-07T19:43:41.2221081Z libgcc-ng pkgs/main/linux-64::libgcc-ng-11.2.0-h1234567_1 2025-05-07T19:43:41.2221565Z libgomp pkgs/main/linux-64::libgomp-11.2.0-h1234567_1 2025-05-07T19:43:41.2222051Z libstdcxx-ng pkgs/main/linux-64::libstdcxx-ng-11.2.0-h1234567_1 2025-05-07T19:43:41.2222551Z ncurses pkgs/main/linux-64::ncurses-6.4-h6a678d5_0 2025-05-07T19:43:41.2222995Z openssl pkgs/main/linux-64::openssl-3.0.16-h5eee18b_0 2025-05-07T19:43:41.2223439Z pip pkgs/main/noarch::pip-25.1-pyhc872135_2 2025-05-07T19:43:41.2223876Z python pkgs/main/linux-64::python-3.9.21-he870216_1 2025-05-07T19:43:41.2224336Z readline pkgs/main/linux-64::readline-8.2-h5eee18b_0 2025-05-07T19:43:41.2224847Z setuptools pkgs/main/linux-64::setuptools-78.1.1-py39h06a4308_0 2025-05-07T19:43:41.2225337Z sqlite pkgs/main/linux-64::sqlite-3.45.3-h5eee18b_0 2025-05-07T19:43:41.2225798Z tk pkgs/main/linux-64::tk-8.6.14-h39e8969_0 2025-05-07T19:43:41.2226327Z tzdata pkgs/main/noarch::tzdata-2025b-h04d1e81_0 2025-05-07T19:43:41.2226750Z wheel pkgs/main/linux-64::wheel-0.45.1-py39h06a4308_0 2025-05-07T19:43:41.2227164Z xz pkgs/main/linux-64::xz-5.6.4-h5eee18b_1 2025-05-07T19:43:41.2227541Z zlib pkgs/main/linux-64::zlib-1.2.13-h5eee18b_1 2025-05-07T19:43:41.2227812Z 2025-05-07T19:43:41.2227816Z 2025-05-07T19:43:41.2227820Z 2025-05-07T19:43:41.2227972Z Downloading and Extracting Packages: ...working... 2025-05-07T19:43:41.2228374Z python-3.9.21 | 25.1 MB | | 0% 2025-05-07T19:43:41.2228789Z 2025-05-07T19:43:41.2229111Z setuptools-78.1.1 | 1.7 MB | | 0%  2025-05-07T19:43:41.2229368Z 2025-05-07T19:43:41.2229392Z 2025-05-07T19:43:41.2229638Z ca-certificates-2025 | 129 KB | | 0%  2025-05-07T19:43:41.2229903Z 2025-05-07T19:43:41.2229907Z 2025-05-07T19:43:41.2229910Z 2025-05-07T19:43:41.2238932Z wheel-0.45.1 | 114 KB | | 0%  2025-05-07T19:43:41.2239660Z 2025-05-07T19:43:41.2239665Z 2025-05-07T19:43:41.2239669Z 2025-05-07T19:43:41.2239672Z 2025-05-07T19:43:41.2243726Z _openmp_mutex-5.1 | 21 KB | | 0%  2025-05-07T19:43:41.2244590Z 2025-05-07T19:43:41.2244602Z 2025-05-07T19:43:41.2244613Z 2025-05-07T19:43:41.2244623Z 2025-05-07T19:43:41.2245715Z 2025-05-07T19:43:41.2663552Z _libgcc_mutex-0.1 | 3 KB | | 0%  2025-05-07T19:43:41.2664449Z 2025-05-07T19:43:41.2664477Z 2025-05-07T19:43:41.2664522Z 2025-05-07T19:43:41.2700674Z wheel-0.45.1 | 114 KB | ########## | 100%  2025-05-07T19:43:41.2701485Z 2025-05-07T19:43:41.2702106Z 2025-05-07T19:43:41.2754510Z ca-certificates-2025 | 129 KB | ########## | 100%  2025-05-07T19:43:41.2754842Z 2025-05-07T19:43:41.2754847Z 2025-05-07T19:43:41.2754851Z 2025-05-07T19:43:41.2754855Z 2025-05-07T19:43:41.2819428Z _openmp_mutex-5.1 | 21 KB | ########## | 100%  2025-05-07T19:43:41.2820316Z 2025-05-07T19:43:41.2820351Z 2025-05-07T19:43:41.2820362Z 2025-05-07T19:43:41.2820373Z 2025-05-07T19:43:41.2820383Z 2025-05-07T19:43:41.2926372Z _libgcc_mutex-0.1 | 3 KB | ########## | 100%  2025-05-07T19:43:41.2926878Z 2025-05-07T19:43:41.2926922Z 2025-05-07T19:43:41.3047874Z ca-certificates-2025 | 129 KB | ########## | 100%  2025-05-07T19:43:41.3048791Z 2025-05-07T19:43:41.3048804Z 2025-05-07T19:43:41.3048815Z 2025-05-07T19:43:41.3048826Z 2025-05-07T19:43:41.3048837Z 2025-05-07T19:43:41.3157710Z _libgcc_mutex-0.1 | 3 KB | ########## | 100%  2025-05-07T19:43:41.3159128Z 2025-05-07T19:43:41.3206045Z setuptools-78.1.1 | 1.7 MB | ########## | 100%  2025-05-07T19:43:41.3278571Z python-3.9.21 | 25.1 MB | #8 | 19% 2025-05-07T19:43:41.3279392Z 2025-05-07T19:43:41.3279407Z 2025-05-07T19:43:41.3279419Z 2025-05-07T19:43:41.3279430Z 2025-05-07T19:43:41.3282798Z _openmp_mutex-5.1 | 21 KB | ########## | 100%  2025-05-07T19:43:41.3283655Z 2025-05-07T19:43:41.3283668Z 2025-05-07T19:43:41.3283679Z 2025-05-07T19:43:41.3283690Z 2025-05-07T19:43:41.3370966Z _openmp_mutex-5.1 | 21 KB | ########## | 100%  2025-05-07T19:43:41.3371845Z 2025-05-07T19:43:41.3371886Z 2025-05-07T19:43:41.3371898Z 2025-05-07T19:43:41.3372557Z wheel-0.45.1 | 114 KB | ########## | 100%  2025-05-07T19:43:41.3373303Z 2025-05-07T19:43:41.3373314Z 2025-05-07T19:43:41.3373325Z 2025-05-07T19:43:41.4206991Z wheel-0.45.1 | 114 KB | ########## | 100%  2025-05-07T19:43:41.5360463Z python-3.9.21 | 25.1 MB | ####### | 71% 2025-05-07T19:43:41.5361314Z 2025-05-07T19:43:41.5361717Z setuptools-78.1.1 | 1.7 MB | ########## | 100%  2025-05-07T19:43:41.5361978Z 2025-05-07T19:43:41.5589030Z setuptools-78.1.1 | 1.7 MB | ########## | 100%  2025-05-07T19:43:42.0433262Z python-3.9.21 | 25.1 MB | ########## | 100% 2025-05-07T19:43:42.0435961Z python-3.9.21 | 25.1 MB | ########## | 100% 2025-05-07T19:43:42.0436542Z 2025-05-07T19:43:42.0436759Z 2025-05-07T19:43:42.0437038Z  2025-05-07T19:43:42.0437260Z 2025-05-07T19:43:42.0437265Z 2025-05-07T19:43:42.0437442Z  2025-05-07T19:43:42.0437687Z 2025-05-07T19:43:42.0437691Z 2025-05-07T19:43:42.0437695Z 2025-05-07T19:43:42.0437871Z  2025-05-07T19:43:42.0438355Z 2025-05-07T19:43:42.0438369Z 2025-05-07T19:43:42.0438373Z 2025-05-07T19:43:42.0438377Z 2025-05-07T19:43:42.0438586Z  2025-05-07T19:43:42.0438815Z 2025-05-07T19:43:42.0438818Z 2025-05-07T19:43:42.0438822Z 2025-05-07T19:43:42.0438825Z 2025-05-07T19:43:42.0438829Z 2025-05-07T19:43:42.0439035Z  done 2025-05-07T19:43:42.2500049Z Preparing transaction: - \ done 2025-05-07T19:43:43.3056126Z Verifying transaction: / - \ | / - \ | / - done 2025-05-07T19:43:45.4195069Z Executing transaction: | / - \ | / - \ | / - \ | / - \ | / - \ | done 2025-05-07T19:43:45.4236326Z # 2025-05-07T19:43:45.4236709Z # To activate this environment, use 2025-05-07T19:43:45.4237039Z # 2025-05-07T19:43:45.4237279Z # $ conda activate build_binary 2025-05-07T19:43:45.4237874Z # 2025-05-07T19:43:45.4238111Z # To deactivate an active environment, use 2025-05-07T19:43:45.4238603Z # 2025-05-07T19:43:45.4238949Z # $ conda deactivate 2025-05-07T19:43:45.4239110Z 2025-05-07T19:43:45.5066367Z [SETUP] Upgrading PIP to latest ... 2025-05-07T19:43:45.5092033Z [EXEC] [ATTEMPT 0/3] + conda run -n build_binary pip install --upgrade pip 2025-05-07T19:43:48.2306137Z WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-05-07T19:43:48.2307885Z 2025-05-07T19:43:48.2308317Z Requirement already satisfied: pip in /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages (25.1) 2025-05-07T19:43:48.2308946Z Collecting pip 2025-05-07T19:43:48.2309276Z Downloading pip-25.1.1-py3-none-any.whl.metadata (3.6 kB) 2025-05-07T19:43:48.2310048Z Downloading pip-25.1.1-py3-none-any.whl (1.8 MB) 2025-05-07T19:43:48.2310913Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 109.9 MB/s eta 0:00:00 2025-05-07T19:43:48.2311292Z Installing collected packages: pip 2025-05-07T19:43:48.2311633Z Attempting uninstall: pip 2025-05-07T19:43:48.2311929Z Found existing installation: pip 25.1 2025-05-07T19:43:48.2312260Z Uninstalling pip-25.1: 2025-05-07T19:43:48.2312542Z Successfully uninstalled pip-25.1 2025-05-07T19:43:48.2312876Z Successfully installed pip-25.1.1 2025-05-07T19:43:48.2313075Z 2025-05-07T19:43:48.2903988Z [SETUP] Upgrading pyOpenSSL ... 2025-05-07T19:43:48.2933702Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y pyOpenSSL>22.1.0 2025-05-07T19:43:48.9603249Z Channels: 2025-05-07T19:43:48.9603723Z - conda-forge 2025-05-07T19:43:48.9605150Z Platform: linux-64 2025-05-07T19:43:58.7113038Z Collecting package metadata (repodata.json): - \ | / - \ | / - \ | / - \ | / - \ done 2025-05-07T19:44:00.4773465Z Solving environment: / - \ | / done 2025-05-07T19:44:00.5218726Z 2025-05-07T19:44:00.5219794Z ## Package Plan ## 2025-05-07T19:44:00.5220268Z 2025-05-07T19:44:00.5220867Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:44:00.5221822Z 2025-05-07T19:44:00.5222100Z added / updated specs: 2025-05-07T19:44:00.5222868Z - pyopenssl[version='>22.1.0'] 2025-05-07T19:44:00.5223466Z 2025-05-07T19:44:00.5223528Z 2025-05-07T19:44:00.5223875Z The following packages will be downloaded: 2025-05-07T19:44:00.5224544Z 2025-05-07T19:44:00.5224663Z package | build 2025-05-07T19:44:00.5225002Z ---------------------------|----------------- 2025-05-07T19:44:00.5225408Z cffi-1.17.1 | py39h15c3d72_0 236 KB conda-forge 2025-05-07T19:44:00.5225898Z cryptography-44.0.3 | py39h7170ec2_0 1.5 MB conda-forge 2025-05-07T19:44:00.5226695Z libgcc-15.1.0 | h767d61c_2 810 KB conda-forge 2025-05-07T19:44:00.5227154Z libgcc-ng-15.1.0 | h69a702a_2 34 KB conda-forge 2025-05-07T19:44:00.5227595Z libgomp-15.1.0 | h767d61c_2 442 KB conda-forge 2025-05-07T19:44:00.5228048Z openssl-3.5.0 | h7b32b05_1 3.0 MB conda-forge 2025-05-07T19:44:00.5228497Z pycparser-2.22 | pyh29332c3_1 108 KB conda-forge 2025-05-07T19:44:00.5229008Z pyopenssl-25.0.0 | pyhd8ed1ab_0 120 KB conda-forge 2025-05-07T19:44:00.5229460Z python_abi-3.9 | 2_cp39 4 KB conda-forge 2025-05-07T19:44:00.5230088Z typing-extensions-4.13.2 | h0e9735f_0 88 KB conda-forge 2025-05-07T19:44:00.5230594Z typing_extensions-4.13.2 | pyh29332c3_0 51 KB conda-forge 2025-05-07T19:44:00.5231059Z ------------------------------------------------------------ 2025-05-07T19:44:00.5231453Z Total: 6.3 MB 2025-05-07T19:44:00.5231683Z 2025-05-07T19:44:00.5231820Z The following NEW packages will be INSTALLED: 2025-05-07T19:44:00.5232072Z 2025-05-07T19:44:00.5232292Z cffi conda-forge/linux-64::cffi-1.17.1-py39h15c3d72_0 2025-05-07T19:44:00.5232802Z cryptography conda-forge/linux-64::cryptography-44.0.3-py39h7170ec2_0 2025-05-07T19:44:00.5233336Z libgcc conda-forge/linux-64::libgcc-15.1.0-h767d61c_2 2025-05-07T19:44:00.5233824Z pycparser conda-forge/noarch::pycparser-2.22-pyh29332c3_1 2025-05-07T19:44:00.5234322Z pyopenssl conda-forge/noarch::pyopenssl-25.0.0-pyhd8ed1ab_0 2025-05-07T19:44:00.5234833Z python_abi conda-forge/linux-64::python_abi-3.9-2_cp39 2025-05-07T19:44:00.5235374Z typing-extensions conda-forge/noarch::typing-extensions-4.13.2-h0e9735f_0 2025-05-07T19:44:00.5236016Z typing_extensions conda-forge/noarch::typing_extensions-4.13.2-pyh29332c3_0 2025-05-07T19:44:00.5236540Z 2025-05-07T19:44:00.5236685Z The following packages will be UPDATED: 2025-05-07T19:44:00.5236899Z 2025-05-07T19:44:00.5239861Z ca-certificates pkgs/main/linux-64::ca-certificates-2~ --> conda-forge/noarch::ca-certificates-2025.4.26-hbd8a1cb_0 2025-05-07T19:44:00.5240672Z libgcc-ng pkgs/main::libgcc-ng-11.2.0-h1234567_1 --> conda-forge::libgcc-ng-15.1.0-h69a702a_2 2025-05-07T19:44:00.5241526Z libgomp pkgs/main::libgomp-11.2.0-h1234567_1 --> conda-forge::libgomp-15.1.0-h767d61c_2 2025-05-07T19:44:00.5242202Z openssl pkgs/main::openssl-3.0.16-h5eee18b_0 --> conda-forge::openssl-3.5.0-h7b32b05_1 2025-05-07T19:44:00.5242594Z 2025-05-07T19:44:00.5242615Z 2025-05-07T19:44:00.5242618Z 2025-05-07T19:44:00.5242767Z Downloading and Extracting Packages: ...working... 2025-05-07T19:44:00.5243155Z openssl-3.5.0 | 3.0 MB | | 0% 2025-05-07T19:44:00.5243422Z 2025-05-07T19:44:00.5243769Z cryptography-44.0.3 | 1.5 MB | | 0%  2025-05-07T19:44:00.5244031Z 2025-05-07T19:44:00.5244035Z 2025-05-07T19:44:00.5244435Z libgcc-15.1.0 | 810 KB | | 0%  2025-05-07T19:44:00.5244692Z 2025-05-07T19:44:00.5244696Z 2025-05-07T19:44:00.5244699Z 2025-05-07T19:44:00.5245665Z libgomp-15.1.0 | 442 KB | | 0%  2025-05-07T19:44:00.5246044Z 2025-05-07T19:44:00.5246050Z 2025-05-07T19:44:00.5246055Z 2025-05-07T19:44:00.5250410Z 2025-05-07T19:44:00.5269378Z cffi-1.17.1 | 236 KB | | 0%  2025-05-07T19:44:00.5269712Z 2025-05-07T19:44:00.5269716Z 2025-05-07T19:44:00.5269720Z 2025-05-07T19:44:00.5269723Z 2025-05-07T19:44:00.5269727Z 2025-05-07T19:44:00.5269990Z pyopenssl-25.0.0 | 120 KB | | 0%  2025-05-07T19:44:00.5270306Z 2025-05-07T19:44:00.5270309Z 2025-05-07T19:44:00.5270313Z 2025-05-07T19:44:00.5270647Z 2025-05-07T19:44:00.5270651Z 2025-05-07T19:44:00.5271149Z 2025-05-07T19:44:00.5271519Z pycparser-2.22 | 108 KB | | 0%  2025-05-07T19:44:00.5271845Z 2025-05-07T19:44:00.5271849Z 2025-05-07T19:44:00.5271852Z 2025-05-07T19:44:00.5271856Z 2025-05-07T19:44:00.5271860Z 2025-05-07T19:44:00.5271863Z 2025-05-07T19:44:00.5271867Z 2025-05-07T19:44:00.5272145Z typing-extensions-4. | 88 KB | | 0%  2025-05-07T19:44:00.5272462Z 2025-05-07T19:44:00.5272482Z 2025-05-07T19:44:00.5272486Z 2025-05-07T19:44:00.5272490Z 2025-05-07T19:44:00.5272493Z 2025-05-07T19:44:00.5272496Z 2025-05-07T19:44:00.5272500Z 2025-05-07T19:44:00.5272504Z 2025-05-07T19:44:00.5282906Z typing_extensions-4. | 51 KB | | 0%  2025-05-07T19:44:00.5283864Z 2025-05-07T19:44:00.5283876Z 2025-05-07T19:44:00.5283887Z 2025-05-07T19:44:00.5283897Z 2025-05-07T19:44:00.5283907Z 2025-05-07T19:44:00.5283917Z 2025-05-07T19:44:00.5283945Z 2025-05-07T19:44:00.5283955Z 2025-05-07T19:44:00.5283966Z 2025-05-07T19:44:00.5284719Z libgcc-ng-15.1.0 | 34 KB | | 0%  2025-05-07T19:44:00.5285553Z 2025-05-07T19:44:00.5285584Z 2025-05-07T19:44:00.5285595Z 2025-05-07T19:44:00.5285606Z 2025-05-07T19:44:00.5285616Z 2025-05-07T19:44:00.5285627Z 2025-05-07T19:44:00.5285638Z 2025-05-07T19:44:00.5285648Z 2025-05-07T19:44:00.5285659Z 2025-05-07T19:44:00.5285669Z 2025-05-07T19:44:00.5736894Z python_abi-3.9 | 4 KB | | 0%  2025-05-07T19:44:00.5738070Z 2025-05-07T19:44:00.5738106Z 2025-05-07T19:44:00.5738118Z 2025-05-07T19:44:00.5738129Z 2025-05-07T19:44:00.6072027Z cffi-1.17.1 | 236 KB | ########## | 100%  2025-05-07T19:44:00.6072336Z 2025-05-07T19:44:00.6072341Z 2025-05-07T19:44:00.6072345Z 2025-05-07T19:44:00.6072366Z 2025-05-07T19:44:00.6072370Z 2025-05-07T19:44:00.6104464Z pyopenssl-25.0.0 | 120 KB | ########## | 100%  2025-05-07T19:44:00.6104822Z 2025-05-07T19:44:00.6104826Z 2025-05-07T19:44:00.6105067Z 2025-05-07T19:44:00.6105073Z 2025-05-07T19:44:00.6139905Z cffi-1.17.1 | 236 KB | ########## | 100%  2025-05-07T19:44:00.6140713Z 2025-05-07T19:44:00.6140727Z 2025-05-07T19:44:00.6140739Z 2025-05-07T19:44:00.6221451Z libgomp-15.1.0 | 442 KB | ########## | 100%  2025-05-07T19:44:00.6222312Z 2025-05-07T19:44:00.6225948Z cryptography-44.0.3 | 1.5 MB | #### | 41%  2025-05-07T19:44:00.6226223Z 2025-05-07T19:44:00.6227008Z 2025-05-07T19:44:00.6352967Z libgcc-15.1.0 | 810 KB | ########2 | 83%  2025-05-07T19:44:00.6353792Z 2025-05-07T19:44:00.6353806Z 2025-05-07T19:44:00.6567195Z libgcc-15.1.0 | 810 KB | ########## | 100%  2025-05-07T19:44:00.6567664Z 2025-05-07T19:44:00.6567669Z 2025-05-07T19:44:00.6567672Z 2025-05-07T19:44:00.6567676Z 2025-05-07T19:44:00.6567680Z 2025-05-07T19:44:00.6571375Z pyopenssl-25.0.0 | 120 KB | ########## | 100%  2025-05-07T19:44:00.6571702Z 2025-05-07T19:44:00.6571706Z 2025-05-07T19:44:00.6571710Z 2025-05-07T19:44:00.6571713Z 2025-05-07T19:44:00.6571716Z 2025-05-07T19:44:00.6586851Z pyopenssl-25.0.0 | 120 KB | ########## | 100%  2025-05-07T19:44:00.6587151Z 2025-05-07T19:44:00.6587155Z 2025-05-07T19:44:00.6587158Z 2025-05-07T19:44:00.6587162Z 2025-05-07T19:44:00.6587165Z 2025-05-07T19:44:00.6587181Z 2025-05-07T19:44:00.6614144Z pycparser-2.22 | 108 KB | #4 | 15%  2025-05-07T19:44:00.6615096Z 2025-05-07T19:44:00.6615881Z cryptography-44.0.3 | 1.5 MB | ########## | 100%  2025-05-07T19:44:00.6616666Z 2025-05-07T19:44:00.6616678Z 2025-05-07T19:44:00.6616689Z 2025-05-07T19:44:00.6616699Z 2025-05-07T19:44:00.6616710Z 2025-05-07T19:44:00.6616720Z 2025-05-07T19:44:00.6616740Z 2025-05-07T19:44:00.6646074Z typing-extensions-4. | 88 KB | #8 | 18%  2025-05-07T19:44:00.6647506Z 2025-05-07T19:44:00.6647522Z 2025-05-07T19:44:00.6647533Z 2025-05-07T19:44:00.6647561Z 2025-05-07T19:44:00.6647572Z 2025-05-07T19:44:00.6647583Z 2025-05-07T19:44:00.6661813Z pycparser-2.22 | 108 KB | ########## | 100%  2025-05-07T19:44:00.6662120Z 2025-05-07T19:44:00.6662124Z 2025-05-07T19:44:00.6662127Z 2025-05-07T19:44:00.6662131Z 2025-05-07T19:44:00.6662134Z 2025-05-07T19:44:00.6662138Z 2025-05-07T19:44:00.6662990Z 2025-05-07T19:44:00.6693049Z typing-extensions-4. | 88 KB | ########## | 100%  2025-05-07T19:44:00.6693698Z openssl-3.5.0 | 3.0 MB | ########## | 100% 2025-05-07T19:44:00.6852442Z openssl-3.5.0 | 3.0 MB | ########## | 100% 2025-05-07T19:44:00.6853087Z 2025-05-07T19:44:00.6853113Z 2025-05-07T19:44:00.6853120Z 2025-05-07T19:44:00.6853955Z libgomp-15.1.0 | 442 KB | ########## | 100%  2025-05-07T19:44:00.6854335Z 2025-05-07T19:44:00.6854340Z 2025-05-07T19:44:00.6855545Z 2025-05-07T19:44:00.6961685Z libgomp-15.1.0 | 442 KB | ########## | 100%  2025-05-07T19:44:00.6961996Z 2025-05-07T19:44:00.6962001Z 2025-05-07T19:44:00.6962005Z 2025-05-07T19:44:00.6962009Z 2025-05-07T19:44:00.6962013Z 2025-05-07T19:44:00.6962041Z 2025-05-07T19:44:00.6962044Z 2025-05-07T19:44:00.6962048Z 2025-05-07T19:44:00.6962051Z 2025-05-07T19:44:00.6962054Z 2025-05-07T19:44:00.6972564Z python_abi-3.9 | 4 KB | ########## | 100%  2025-05-07T19:44:00.6972885Z 2025-05-07T19:44:00.6972889Z 2025-05-07T19:44:00.6972893Z 2025-05-07T19:44:00.6972919Z 2025-05-07T19:44:00.6972922Z 2025-05-07T19:44:00.6972926Z 2025-05-07T19:44:00.6972929Z 2025-05-07T19:44:00.6972933Z 2025-05-07T19:44:00.6972936Z 2025-05-07T19:44:00.6972948Z 2025-05-07T19:44:00.7061927Z python_abi-3.9 | 4 KB | ########## | 100%  2025-05-07T19:44:00.7062291Z 2025-05-07T19:44:00.7062296Z 2025-05-07T19:44:00.7062316Z 2025-05-07T19:44:00.7062320Z 2025-05-07T19:44:00.7062324Z 2025-05-07T19:44:00.7062327Z 2025-05-07T19:44:00.7062568Z 2025-05-07T19:44:00.7062573Z 2025-05-07T19:44:00.7089213Z typing_extensions-4. | 51 KB | ###1 | 31%  2025-05-07T19:44:00.7089598Z 2025-05-07T19:44:00.7089603Z 2025-05-07T19:44:00.7089606Z 2025-05-07T19:44:00.7089610Z 2025-05-07T19:44:00.7089617Z 2025-05-07T19:44:00.7089621Z 2025-05-07T19:44:00.7089624Z 2025-05-07T19:44:00.7089628Z 2025-05-07T19:44:00.7179400Z typing_extensions-4. | 51 KB | ########## | 100%  2025-05-07T19:44:00.7179764Z 2025-05-07T19:44:00.7179770Z 2025-05-07T19:44:00.7179774Z 2025-05-07T19:44:00.7179779Z 2025-05-07T19:44:00.7179782Z 2025-05-07T19:44:00.7179786Z 2025-05-07T19:44:00.7179789Z 2025-05-07T19:44:00.7179794Z 2025-05-07T19:44:00.7179798Z 2025-05-07T19:44:00.7189436Z libgcc-ng-15.1.0 | 34 KB | ####7 | 47%  2025-05-07T19:44:00.7189767Z 2025-05-07T19:44:00.7189807Z 2025-05-07T19:44:00.7189826Z 2025-05-07T19:44:00.7189831Z 2025-05-07T19:44:00.7189853Z 2025-05-07T19:44:00.7189857Z 2025-05-07T19:44:00.7189860Z 2025-05-07T19:44:00.7189864Z 2025-05-07T19:44:00.7189867Z 2025-05-07T19:44:00.7197752Z libgcc-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:00.7198041Z 2025-05-07T19:44:00.7199080Z 2025-05-07T19:44:00.7368365Z libgcc-15.1.0 | 810 KB | ########## | 100%  2025-05-07T19:44:00.7368677Z 2025-05-07T19:44:00.7368682Z 2025-05-07T19:44:00.7368686Z 2025-05-07T19:44:00.7368690Z 2025-05-07T19:44:00.7368693Z 2025-05-07T19:44:00.7368697Z 2025-05-07T19:44:00.7368700Z 2025-05-07T19:44:00.7932720Z typing-extensions-4. | 88 KB | ########## | 100%  2025-05-07T19:44:00.7933073Z 2025-05-07T19:44:00.7933243Z 2025-05-07T19:44:00.7933256Z 2025-05-07T19:44:00.7933263Z 2025-05-07T19:44:00.7933269Z 2025-05-07T19:44:00.7933275Z 2025-05-07T19:44:00.7933899Z pycparser-2.22 | 108 KB | ########## | 100%  2025-05-07T19:44:00.7934537Z 2025-05-07T19:44:00.7934561Z 2025-05-07T19:44:00.7934565Z 2025-05-07T19:44:00.7934568Z 2025-05-07T19:44:00.7934572Z 2025-05-07T19:44:00.7934576Z 2025-05-07T19:44:00.8027461Z pycparser-2.22 | 108 KB | ########## | 100%  2025-05-07T19:44:00.8027825Z 2025-05-07T19:44:00.8027830Z 2025-05-07T19:44:00.8027834Z 2025-05-07T19:44:00.8027837Z 2025-05-07T19:44:00.8027841Z 2025-05-07T19:44:00.8027845Z 2025-05-07T19:44:00.8027848Z 2025-05-07T19:44:00.8027852Z 2025-05-07T19:44:00.8027855Z 2025-05-07T19:44:00.8027859Z 2025-05-07T19:44:00.8169905Z python_abi-3.9 | 4 KB | ########## | 100%  2025-05-07T19:44:00.8170272Z 2025-05-07T19:44:00.8170277Z 2025-05-07T19:44:00.8170281Z 2025-05-07T19:44:00.8170284Z 2025-05-07T19:44:00.8170288Z 2025-05-07T19:44:00.8170291Z 2025-05-07T19:44:00.8170295Z 2025-05-07T19:44:00.8170298Z 2025-05-07T19:44:00.8170604Z typing_extensions-4. | 51 KB | ########## | 100%  2025-05-07T19:44:00.8170955Z 2025-05-07T19:44:00.8170960Z 2025-05-07T19:44:00.8170963Z 2025-05-07T19:44:00.8170966Z 2025-05-07T19:44:00.8170970Z 2025-05-07T19:44:00.8170973Z 2025-05-07T19:44:00.8170977Z 2025-05-07T19:44:00.8170980Z 2025-05-07T19:44:00.8575914Z typing_extensions-4. | 51 KB | ########## | 100%  2025-05-07T19:44:00.8576279Z 2025-05-07T19:44:00.8576284Z 2025-05-07T19:44:00.8576288Z 2025-05-07T19:44:00.8576307Z 2025-05-07T19:44:00.8576313Z 2025-05-07T19:44:00.8576316Z 2025-05-07T19:44:00.8576320Z 2025-05-07T19:44:00.8576323Z 2025-05-07T19:44:00.8576326Z 2025-05-07T19:44:00.8580925Z libgcc-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:00.8581259Z 2025-05-07T19:44:00.8581272Z 2025-05-07T19:44:00.8581276Z 2025-05-07T19:44:00.8581279Z 2025-05-07T19:44:00.8581283Z 2025-05-07T19:44:00.8581286Z 2025-05-07T19:44:00.8581289Z 2025-05-07T19:44:00.8581308Z 2025-05-07T19:44:00.8581311Z 2025-05-07T19:44:00.8983213Z libgcc-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:00.9023388Z openssl-3.5.0 | 3.0 MB | ########## | 100% 2025-05-07T19:44:00.9024152Z 2025-05-07T19:44:00.9024585Z cryptography-44.0.3 | 1.5 MB | ########## | 100%  2025-05-07T19:44:00.9024870Z 2025-05-07T19:44:00.9029345Z cryptography-44.0.3 | 1.5 MB | ########## | 100%  2025-05-07T19:44:00.9029758Z 2025-05-07T19:44:00.9029979Z 2025-05-07T19:44:00.9030178Z  2025-05-07T19:44:00.9030398Z 2025-05-07T19:44:00.9030403Z 2025-05-07T19:44:00.9030582Z  2025-05-07T19:44:00.9030810Z 2025-05-07T19:44:00.9030835Z 2025-05-07T19:44:00.9030839Z 2025-05-07T19:44:00.9031036Z  2025-05-07T19:44:00.9031285Z 2025-05-07T19:44:00.9031289Z 2025-05-07T19:44:00.9031293Z 2025-05-07T19:44:00.9031296Z 2025-05-07T19:44:00.9031506Z  2025-05-07T19:44:00.9031737Z 2025-05-07T19:44:00.9031740Z 2025-05-07T19:44:00.9031744Z 2025-05-07T19:44:00.9031747Z 2025-05-07T19:44:00.9031751Z 2025-05-07T19:44:00.9031967Z  2025-05-07T19:44:00.9032202Z 2025-05-07T19:44:00.9032206Z 2025-05-07T19:44:00.9032210Z 2025-05-07T19:44:00.9032213Z 2025-05-07T19:44:00.9032217Z 2025-05-07T19:44:00.9032220Z 2025-05-07T19:44:00.9032411Z  2025-05-07T19:44:00.9032680Z 2025-05-07T19:44:00.9032684Z 2025-05-07T19:44:00.9032687Z 2025-05-07T19:44:00.9032691Z 2025-05-07T19:44:00.9032694Z 2025-05-07T19:44:00.9032698Z 2025-05-07T19:44:00.9032701Z 2025-05-07T19:44:00.9032894Z  2025-05-07T19:44:00.9033344Z 2025-05-07T19:44:00.9033348Z 2025-05-07T19:44:00.9033356Z 2025-05-07T19:44:00.9033360Z 2025-05-07T19:44:00.9033363Z 2025-05-07T19:44:00.9033367Z 2025-05-07T19:44:00.9033370Z 2025-05-07T19:44:00.9033373Z 2025-05-07T19:44:00.9033574Z  2025-05-07T19:44:00.9033814Z 2025-05-07T19:44:00.9033846Z 2025-05-07T19:44:00.9033849Z 2025-05-07T19:44:00.9033853Z 2025-05-07T19:44:00.9033857Z 2025-05-07T19:44:00.9033860Z 2025-05-07T19:44:00.9033863Z 2025-05-07T19:44:00.9033867Z 2025-05-07T19:44:00.9033870Z 2025-05-07T19:44:00.9034068Z  2025-05-07T19:44:00.9034308Z 2025-05-07T19:44:00.9034313Z 2025-05-07T19:44:00.9034337Z 2025-05-07T19:44:00.9034341Z 2025-05-07T19:44:00.9034344Z 2025-05-07T19:44:00.9034348Z 2025-05-07T19:44:00.9034351Z 2025-05-07T19:44:00.9034355Z 2025-05-07T19:44:00.9034358Z 2025-05-07T19:44:00.9034367Z 2025-05-07T19:44:00.9034584Z  done 2025-05-07T19:44:01.0043448Z Preparing transaction: \ done 2025-05-07T19:44:01.1055463Z Verifying transaction: / done 2025-05-07T19:44:02.5093619Z Executing transaction: \ | / - \ | / - \ | / - \ | done 2025-05-07T19:44:02.6054428Z [SETUP] Testing pyOpenSSL import ... 2025-05-07T19:44:04.2868994Z [CHECK] Python (sub-)package 'OpenSSL' found ... 2025-05-07T19:44:04.2889904Z [SETUP] Installing libxcrypt ... 2025-05-07T19:44:04.2918425Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y libxcrypt 2025-05-07T19:44:04.9647103Z Channels: 2025-05-07T19:44:04.9647813Z - conda-forge 2025-05-07T19:44:04.9648462Z Platform: linux-64 2025-05-07T19:44:08.1700873Z Collecting package metadata (repodata.json): - \ | / done 2025-05-07T19:44:08.6021938Z Solving environment: \ | done 2025-05-07T19:44:08.6491595Z 2025-05-07T19:44:08.6492291Z ## Package Plan ## 2025-05-07T19:44:08.6492760Z 2025-05-07T19:44:08.6493868Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:44:08.6494839Z 2025-05-07T19:44:08.6495261Z added / updated specs: 2025-05-07T19:44:08.6495549Z - libxcrypt 2025-05-07T19:44:08.6495688Z 2025-05-07T19:44:08.6495692Z 2025-05-07T19:44:08.6495823Z The following packages will be downloaded: 2025-05-07T19:44:08.6496088Z 2025-05-07T19:44:08.6496224Z package | build 2025-05-07T19:44:08.6496578Z ---------------------------|----------------- 2025-05-07T19:44:08.6497131Z libxcrypt-4.4.36 | hd590300_1 98 KB conda-forge 2025-05-07T19:44:08.6497767Z ------------------------------------------------------------ 2025-05-07T19:44:08.6498152Z Total: 98 KB 2025-05-07T19:44:08.6498420Z 2025-05-07T19:44:08.6498563Z The following NEW packages will be INSTALLED: 2025-05-07T19:44:08.6498822Z 2025-05-07T19:44:08.6499091Z libxcrypt conda-forge/linux-64::libxcrypt-4.4.36-hd590300_1 2025-05-07T19:44:08.6499434Z 2025-05-07T19:44:08.6499439Z 2025-05-07T19:44:08.6499442Z 2025-05-07T19:44:08.6499599Z Downloading and Extracting Packages: ...working... 2025-05-07T19:44:08.8403931Z libxcrypt-4.4.36 | 98 KB | | 0% 2025-05-07T19:44:08.8421780Z libxcrypt-4.4.36 | 98 KB | #6 | 16% 2025-05-07T19:44:08.8545734Z libxcrypt-4.4.36 | 98 KB | ########## | 100% 2025-05-07T19:44:08.8554292Z libxcrypt-4.4.36 | 98 KB | ########## | 100% 2025-05-07T19:44:08.8555402Z 2025-05-07T19:44:08.8556166Z done 2025-05-07T19:44:08.9563192Z Preparing transaction: - done 2025-05-07T19:44:09.0573710Z Verifying transaction: | done 2025-05-07T19:44:09.1581806Z Executing transaction: - done 2025-05-07T19:44:12.4641923Z [SETUP] Copying over ... 2025-05-07T19:44:12.4644599Z + cp /github/home/miniconda/envs/build_binary/include/crypt.h /github/home/miniconda/envs/build_binary/include/python3.9/crypt.h 2025-05-07T19:44:12.4646315Z 2025-05-07T19:44:12.4685331Z 2025-05-07T19:44:14.0730862Z [SETUP] Installed Python version: Python 3.9.21 2025-05-07T19:44:14.0732162Z [SETUP] Successfully created Conda environment: build_binary 2025-05-07T19:44:14.0802675Z ##[group]Run . $PRELUDE; install_cxx_compiler $BUILD_ENV clang 2025-05-07T19:44:14.0803168Z . $PRELUDE; install_cxx_compiler $BUILD_ENV clang 2025-05-07T19:44:14.0803798Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:44:14.0804135Z env: 2025-05-07T19:44:14.0804357Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:44:14.0804674Z BUILD_ENV: build_binary 2025-05-07T19:44:14.0804909Z BUILD_TARGET: genai 2025-05-07T19:44:14.0805151Z BUILD_VARIANT: cuda 2025-05-07T19:44:14.0805379Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:44:14.0805638Z ##[endgroup] 2025-05-07T19:44:14.5261325Z ################################################################################ 2025-05-07T19:44:14.5261929Z # Install C/C++ Compilers 2025-05-07T19:44:14.5262200Z # 2025-05-07T19:44:14.5275836Z # [2025-05-07T19:44:14.527Z] + install_cxx_compiler build_binary clang 2025-05-07T19:44:14.5277313Z ################################################################################ 2025-05-07T19:44:14.5278014Z 2025-05-07T19:44:14.5292569Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:44:14.6159389Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:44:14.6166382Z [INSTALL] Installing GLIBC (architecture = 64) ... 2025-05-07T19:44:14.6189157Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y sysroot_linux-64=2.17 2025-05-07T19:44:15.2780030Z Channels: 2025-05-07T19:44:15.2780293Z - conda-forge 2025-05-07T19:44:15.2780556Z Platform: linux-64 2025-05-07T19:44:18.3383848Z Collecting package metadata (repodata.json): - \ | / done 2025-05-07T19:44:18.7695661Z Solving environment: \ | done 2025-05-07T19:44:18.8148466Z 2025-05-07T19:44:18.8149403Z ## Package Plan ## 2025-05-07T19:44:18.8149663Z 2025-05-07T19:44:18.8149905Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:44:18.8150298Z 2025-05-07T19:44:18.8150415Z added / updated specs: 2025-05-07T19:44:18.8150766Z - sysroot_linux-64=2.17 2025-05-07T19:44:18.8150981Z 2025-05-07T19:44:18.8150985Z 2025-05-07T19:44:18.8151123Z The following packages will be downloaded: 2025-05-07T19:44:18.8151362Z 2025-05-07T19:44:18.8151517Z package | build 2025-05-07T19:44:18.8151872Z ---------------------------|----------------- 2025-05-07T19:44:18.8152366Z kernel-headers_linux-64-3.10.0| he073ed8_18 921 KB conda-forge 2025-05-07T19:44:18.8152911Z sysroot_linux-64-2.17 | h0157908_18 14.5 MB conda-forge 2025-05-07T19:44:18.8153395Z ------------------------------------------------------------ 2025-05-07T19:44:18.8153798Z Total: 15.4 MB 2025-05-07T19:44:18.8154067Z 2025-05-07T19:44:18.8154213Z The following NEW packages will be INSTALLED: 2025-05-07T19:44:18.8154458Z 2025-05-07T19:44:18.8154821Z kernel-headers_li~ conda-forge/noarch::kernel-headers_linux-64-3.10.0-he073ed8_18 2025-05-07T19:44:18.8155468Z sysroot_linux-64 conda-forge/noarch::sysroot_linux-64-2.17-h0157908_18 2025-05-07T19:44:18.8155842Z 2025-05-07T19:44:18.8155846Z 2025-05-07T19:44:18.8155851Z 2025-05-07T19:44:18.8156012Z Downloading and Extracting Packages: ...working... 2025-05-07T19:44:18.8156477Z sysroot_linux-64-2.1 | 14.5 MB | | 0% 2025-05-07T19:44:18.8156734Z 2025-05-07T19:44:19.0185971Z kernel-headers_linux | 921 KB | | 0%  2025-05-07T19:44:19.0235298Z sysroot_linux-64-2.1 | 14.5 MB | | 0% 2025-05-07T19:44:19.0236109Z 2025-05-07T19:44:19.0338679Z kernel-headers_linux | 921 KB | 1 | 2%  2025-05-07T19:44:19.0339350Z 2025-05-07T19:44:19.1227876Z kernel-headers_linux | 921 KB | ########## | 100%  2025-05-07T19:44:19.2170439Z sysroot_linux-64-2.1 | 14.5 MB | ######7 | 67% 2025-05-07T19:44:19.2170810Z 2025-05-07T19:44:19.2171366Z kernel-headers_linux | 921 KB | ########## | 100%  2025-05-07T19:44:19.2171651Z 2025-05-07T19:44:19.2506752Z kernel-headers_linux | 921 KB | ########## | 100%  2025-05-07T19:44:19.6916862Z sysroot_linux-64-2.1 | 14.5 MB | ########## | 100% 2025-05-07T19:44:19.6922716Z sysroot_linux-64-2.1 | 14.5 MB | ########## | 100% 2025-05-07T19:44:19.6923169Z 2025-05-07T19:44:19.6923437Z 2025-05-07T19:44:19.6923798Z  done 2025-05-07T19:44:19.7931479Z Preparing transaction: - done 2025-05-07T19:44:19.9941325Z Verifying transaction: | / done 2025-05-07T19:44:20.0958260Z Executing transaction: \ done 2025-05-07T19:44:20.1778902Z [CHECK] LD_LIBRARY_PATH = 2025-05-07T19:44:20.1779408Z [CHECK] CONDA_PREFIX is not set. 2025-05-07T19:44:21.7885542Z [CHECK] libstdc++.so.6 found in CONDA_PREFIX PATH (symbolic link): /github/home/miniconda/envs/build_binary/lib/libstdc++.so.6 2025-05-07T19:44:21.7899107Z [INSTALL] Installing GCC (11.4.0, 64) through Conda ... 2025-05-07T19:44:21.7932698Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y gxx_linux-64=11.4.0 2025-05-07T19:44:22.5019008Z Channels: 2025-05-07T19:44:22.5019708Z - conda-forge 2025-05-07T19:44:22.5020354Z Platform: linux-64 2025-05-07T19:44:25.6060378Z Collecting package metadata (repodata.json): - \ | / done 2025-05-07T19:44:26.7473046Z Solving environment: \ | / done 2025-05-07T19:44:26.7967091Z 2025-05-07T19:44:26.7968302Z ## Package Plan ## 2025-05-07T19:44:26.7968857Z 2025-05-07T19:44:26.7969471Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:44:26.7970440Z 2025-05-07T19:44:26.7970727Z added / updated specs: 2025-05-07T19:44:26.7971559Z - gxx_linux-64=11.4.0 2025-05-07T19:44:26.7972033Z 2025-05-07T19:44:26.7972045Z 2025-05-07T19:44:26.7972402Z The following packages will be downloaded: 2025-05-07T19:44:26.7973066Z 2025-05-07T19:44:26.7973431Z package | build 2025-05-07T19:44:26.7974409Z ---------------------------|----------------- 2025-05-07T19:44:26.7975669Z binutils_impl_linux-64-2.40| ha1999f0_7 6.0 MB conda-forge 2025-05-07T19:44:26.7977351Z binutils_linux-64-2.40 | hb3c18ed_4 28 KB conda-forge 2025-05-07T19:44:26.7978672Z gcc_impl_linux-64-11.4.0 | h00c12a0_13 53.0 MB conda-forge 2025-05-07T19:44:26.7979187Z gcc_linux-64-11.4.0 | ha077dfb_4 31 KB conda-forge 2025-05-07T19:44:26.7979676Z gxx_impl_linux-64-11.4.0 | h634f3ee_13 11.2 MB conda-forge 2025-05-07T19:44:26.7980187Z gxx_linux-64-11.4.0 | h35bfe5d_4 29 KB conda-forge 2025-05-07T19:44:26.7980677Z ld_impl_linux-64-2.40 | hf3520f5_7 691 KB conda-forge 2025-05-07T19:44:26.7981216Z libgcc-devel_linux-64-11.4.0| h8f596e0_113 2.3 MB conda-forge 2025-05-07T19:44:26.7981744Z libsanitizer-11.4.0 | h5763a12_13 3.5 MB conda-forge 2025-05-07T19:44:26.7982267Z libstdcxx-15.1.0 | h8f9b012_2 3.7 MB conda-forge 2025-05-07T19:44:26.7982823Z libstdcxx-devel_linux-64-11.4.0| h8f596e0_113 11.1 MB conda-forge 2025-05-07T19:44:26.7983356Z libstdcxx-ng-15.1.0 | h4852527_2 34 KB conda-forge 2025-05-07T19:44:26.7983832Z ------------------------------------------------------------ 2025-05-07T19:44:26.7984216Z Total: 91.6 MB 2025-05-07T19:44:26.7984479Z 2025-05-07T19:44:26.7984622Z The following NEW packages will be INSTALLED: 2025-05-07T19:44:26.7984864Z 2025-05-07T19:44:26.7985207Z binutils_impl_lin~ conda-forge/linux-64::binutils_impl_linux-64-2.40-ha1999f0_7 2025-05-07T19:44:26.8005829Z binutils_linux-64 conda-forge/linux-64::binutils_linux-64-2.40-hb3c18ed_4 2025-05-07T19:44:26.8006803Z gcc_impl_linux-64 conda-forge/linux-64::gcc_impl_linux-64-11.4.0-h00c12a0_13 2025-05-07T19:44:26.8007674Z gcc_linux-64 conda-forge/linux-64::gcc_linux-64-11.4.0-ha077dfb_4 2025-05-07T19:44:26.8008257Z gxx_impl_linux-64 conda-forge/linux-64::gxx_impl_linux-64-11.4.0-h634f3ee_13 2025-05-07T19:44:26.8008886Z gxx_linux-64 conda-forge/linux-64::gxx_linux-64-11.4.0-h35bfe5d_4 2025-05-07T19:44:26.8009490Z libgcc-devel_linu~ conda-forge/noarch::libgcc-devel_linux-64-11.4.0-h8f596e0_113 2025-05-07T19:44:26.8010137Z libsanitizer conda-forge/linux-64::libsanitizer-11.4.0-h5763a12_13 2025-05-07T19:44:26.8010718Z libstdcxx conda-forge/linux-64::libstdcxx-15.1.0-h8f9b012_2 2025-05-07T19:44:26.8011320Z libstdcxx-devel_l~ conda-forge/noarch::libstdcxx-devel_linux-64-11.4.0-h8f596e0_113 2025-05-07T19:44:26.8011774Z 2025-05-07T19:44:26.8011907Z The following packages will be UPDATED: 2025-05-07T19:44:26.8012143Z 2025-05-07T19:44:26.8012491Z ld_impl_linux-64 pkgs/main::ld_impl_linux-64-2.40-h12e~ --> conda-forge::ld_impl_linux-64-2.40-hf3520f5_7 2025-05-07T19:44:26.8013320Z libstdcxx-ng pkgs/main::libstdcxx-ng-11.2.0-h12345~ --> conda-forge::libstdcxx-ng-15.1.0-h4852527_2 2025-05-07T19:44:26.8013774Z 2025-05-07T19:44:26.8013808Z 2025-05-07T19:44:26.8013811Z 2025-05-07T19:44:26.8013974Z Downloading and Extracting Packages: ...working... 2025-05-07T19:44:26.8014391Z gcc_impl_linux-64-11 | 53.0 MB | | 0% 2025-05-07T19:44:26.8014738Z 2025-05-07T19:44:26.8015062Z gxx_impl_linux-64-11 | 11.2 MB | | 0%  2025-05-07T19:44:26.8015325Z 2025-05-07T19:44:26.8015329Z 2025-05-07T19:44:26.8015585Z libstdcxx-devel_linu | 11.1 MB | | 0%  2025-05-07T19:44:26.8015878Z 2025-05-07T19:44:26.8015882Z 2025-05-07T19:44:26.8015891Z 2025-05-07T19:44:26.8016139Z binutils_impl_linux- | 6.0 MB | | 0%  2025-05-07T19:44:26.8016462Z 2025-05-07T19:44:26.8016466Z 2025-05-07T19:44:26.8016469Z 2025-05-07T19:44:26.8016473Z 2025-05-07T19:44:26.8016721Z libstdcxx-15.1.0 | 3.7 MB | | 0%  2025-05-07T19:44:26.8017136Z 2025-05-07T19:44:26.8017147Z 2025-05-07T19:44:26.8017152Z 2025-05-07T19:44:26.8017158Z 2025-05-07T19:44:26.8017163Z 2025-05-07T19:44:26.8017469Z libsanitizer-11.4.0 | 3.5 MB | | 0%  2025-05-07T19:44:26.8017777Z 2025-05-07T19:44:26.8017781Z 2025-05-07T19:44:26.8017784Z 2025-05-07T19:44:26.8017788Z 2025-05-07T19:44:26.8017791Z 2025-05-07T19:44:26.8017795Z 2025-05-07T19:44:26.8018147Z libgcc-devel_linux-6 | 2.3 MB | | 0%  2025-05-07T19:44:26.8018461Z 2025-05-07T19:44:26.8018465Z 2025-05-07T19:44:26.8018468Z 2025-05-07T19:44:26.8018472Z 2025-05-07T19:44:26.8018475Z 2025-05-07T19:44:26.8018485Z 2025-05-07T19:44:26.8018488Z 2025-05-07T19:44:26.8018775Z ld_impl_linux-64-2.4 | 691 KB | | 0%  2025-05-07T19:44:26.8019076Z 2025-05-07T19:44:26.8019079Z 2025-05-07T19:44:26.8019083Z 2025-05-07T19:44:26.8019086Z 2025-05-07T19:44:26.8019089Z 2025-05-07T19:44:26.8019093Z 2025-05-07T19:44:26.8019096Z 2025-05-07T19:44:26.8019105Z 2025-05-07T19:44:26.8019378Z libstdcxx-ng-15.1.0 | 34 KB | | 0%  2025-05-07T19:44:26.8019709Z 2025-05-07T19:44:26.8019713Z 2025-05-07T19:44:26.8019716Z 2025-05-07T19:44:26.8019720Z 2025-05-07T19:44:26.8019723Z 2025-05-07T19:44:26.8019727Z 2025-05-07T19:44:26.8019730Z 2025-05-07T19:44:26.8019734Z 2025-05-07T19:44:26.8019737Z 2025-05-07T19:44:26.8019999Z gcc_linux-64-11.4.0 | 31 KB | | 0%  2025-05-07T19:44:26.8020324Z 2025-05-07T19:44:26.8020328Z 2025-05-07T19:44:26.8020332Z 2025-05-07T19:44:26.8020336Z 2025-05-07T19:44:26.8020339Z 2025-05-07T19:44:26.8020452Z 2025-05-07T19:44:26.8020455Z 2025-05-07T19:44:26.8020459Z 2025-05-07T19:44:26.8020462Z 2025-05-07T19:44:26.8020466Z 2025-05-07T19:44:26.8020733Z gxx_linux-64-11.4.0 | 29 KB | | 0%  2025-05-07T19:44:26.8021060Z 2025-05-07T19:44:26.8021064Z 2025-05-07T19:44:26.8021067Z 2025-05-07T19:44:26.8021144Z 2025-05-07T19:44:26.8021148Z 2025-05-07T19:44:26.8021152Z 2025-05-07T19:44:26.8021155Z 2025-05-07T19:44:26.8021158Z 2025-05-07T19:44:26.8021162Z 2025-05-07T19:44:26.8021165Z 2025-05-07T19:44:26.8021168Z 2025-05-07T19:44:26.9527409Z binutils_linux-64-2. | 28 KB | | 0%  2025-05-07T19:44:26.9528450Z 2025-05-07T19:44:26.9528464Z 2025-05-07T19:44:26.9529169Z libstdcxx-devel_linu | 11.1 MB | | 0%  2025-05-07T19:44:26.9530018Z 2025-05-07T19:44:26.9530029Z 2025-05-07T19:44:26.9530040Z 2025-05-07T19:44:26.9530051Z 2025-05-07T19:44:26.9530750Z libstdcxx-15.1.0 | 3.7 MB | | 0%  2025-05-07T19:44:26.9531609Z 2025-05-07T19:44:26.9531620Z 2025-05-07T19:44:26.9531642Z 2025-05-07T19:44:26.9894212Z binutils_impl_linux- | 6.0 MB | | 0%  2025-05-07T19:44:27.0416420Z gcc_impl_linux-64-11 | 53.0 MB | | 0% 2025-05-07T19:44:27.0416750Z 2025-05-07T19:44:27.0416773Z 2025-05-07T19:44:27.0416777Z 2025-05-07T19:44:27.0416780Z 2025-05-07T19:44:27.0464320Z libstdcxx-15.1.0 | 3.7 MB | ########## | 100%  2025-05-07T19:44:27.0465224Z 2025-05-07T19:44:27.0781712Z gxx_impl_linux-64-11 | 11.2 MB | | 0%  2025-05-07T19:44:27.0782598Z 2025-05-07T19:44:27.0782611Z 2025-05-07T19:44:27.0782622Z 2025-05-07T19:44:27.0783344Z binutils_impl_linux- | 6.0 MB | ########## | 100%  2025-05-07T19:44:27.0784223Z 2025-05-07T19:44:27.0784236Z 2025-05-07T19:44:27.0784246Z 2025-05-07T19:44:27.0868347Z binutils_impl_linux- | 6.0 MB | ########## | 100%  2025-05-07T19:44:27.0869324Z 2025-05-07T19:44:27.0869337Z 2025-05-07T19:44:27.0869347Z 2025-05-07T19:44:27.0869358Z 2025-05-07T19:44:27.0869368Z 2025-05-07T19:44:27.0895744Z libsanitizer-11.4.0 | 3.5 MB | | 0%  2025-05-07T19:44:27.1278488Z gcc_impl_linux-64-11 | 53.0 MB | ## | 20% 2025-05-07T19:44:27.1279333Z 2025-05-07T19:44:27.1279382Z 2025-05-07T19:44:27.1279394Z 2025-05-07T19:44:27.1279405Z 2025-05-07T19:44:27.1279416Z 2025-05-07T19:44:27.1279427Z 2025-05-07T19:44:27.1304514Z libgcc-devel_linux-6 | 2.3 MB | | 1%  2025-05-07T19:44:27.1304910Z 2025-05-07T19:44:27.1304914Z 2025-05-07T19:44:27.1305173Z libstdcxx-devel_linu | 11.1 MB | ########## | 100%  2025-05-07T19:44:27.1305462Z 2025-05-07T19:44:27.1305466Z 2025-05-07T19:44:27.1466281Z libstdcxx-devel_linu | 11.1 MB | ########## | 100%  2025-05-07T19:44:27.1466612Z 2025-05-07T19:44:27.1713602Z gxx_impl_linux-64-11 | 11.2 MB | ####4 | 45%  2025-05-07T19:44:27.1714518Z 2025-05-07T19:44:27.1714531Z 2025-05-07T19:44:27.1714542Z 2025-05-07T19:44:27.1714552Z 2025-05-07T19:44:27.1714563Z 2025-05-07T19:44:27.1799855Z libsanitizer-11.4.0 | 3.5 MB | ########## | 100%  2025-05-07T19:44:27.1800866Z 2025-05-07T19:44:27.1800880Z 2025-05-07T19:44:27.1800891Z 2025-05-07T19:44:27.1800902Z 2025-05-07T19:44:27.1800947Z 2025-05-07T19:44:27.1800959Z 2025-05-07T19:44:27.1800970Z 2025-05-07T19:44:27.1849313Z ld_impl_linux-64-2.4 | 691 KB | 2 | 2%  2025-05-07T19:44:27.1850327Z 2025-05-07T19:44:27.1850342Z 2025-05-07T19:44:27.1850352Z 2025-05-07T19:44:27.1850363Z 2025-05-07T19:44:27.1850373Z 2025-05-07T19:44:27.1850384Z 2025-05-07T19:44:27.1898492Z libgcc-devel_linux-6 | 2.3 MB | ########## | 100%  2025-05-07T19:44:27.2024127Z gcc_impl_linux-64-11 | 53.0 MB | ###8 | 39% 2025-05-07T19:44:27.2024746Z 2025-05-07T19:44:27.2024751Z 2025-05-07T19:44:27.2024755Z 2025-05-07T19:44:27.2024759Z 2025-05-07T19:44:27.2025084Z 2025-05-07T19:44:27.2025088Z 2025-05-07T19:44:27.2025092Z 2025-05-07T19:44:27.2093456Z ld_impl_linux-64-2.4 | 691 KB | ########## | 100%  2025-05-07T19:44:27.2094431Z 2025-05-07T19:44:27.2094446Z 2025-05-07T19:44:27.2094458Z 2025-05-07T19:44:27.2094469Z 2025-05-07T19:44:27.2095387Z libstdcxx-15.1.0 | 3.7 MB | ########## | 100%  2025-05-07T19:44:27.2095738Z 2025-05-07T19:44:27.2095742Z 2025-05-07T19:44:27.2095746Z 2025-05-07T19:44:27.2095750Z 2025-05-07T19:44:27.2331353Z libstdcxx-15.1.0 | 3.7 MB | ########## | 100%  2025-05-07T19:44:27.2332309Z 2025-05-07T19:44:27.2332323Z 2025-05-07T19:44:27.2332334Z 2025-05-07T19:44:27.2332344Z 2025-05-07T19:44:27.2332355Z 2025-05-07T19:44:27.2332365Z 2025-05-07T19:44:27.2332375Z 2025-05-07T19:44:27.2332385Z 2025-05-07T19:44:27.2332395Z 2025-05-07T19:44:27.2344067Z gcc_linux-64-11.4.0 | 31 KB | #####2 | 52%  2025-05-07T19:44:27.2345124Z 2025-05-07T19:44:27.2345137Z 2025-05-07T19:44:27.2345148Z 2025-05-07T19:44:27.2345158Z 2025-05-07T19:44:27.2345169Z 2025-05-07T19:44:27.2345179Z 2025-05-07T19:44:27.2345190Z 2025-05-07T19:44:27.2345200Z 2025-05-07T19:44:27.2345211Z 2025-05-07T19:44:27.2367135Z gcc_linux-64-11.4.0 | 31 KB | ########## | 100%  2025-05-07T19:44:27.2367795Z 2025-05-07T19:44:27.2368104Z 2025-05-07T19:44:27.2368115Z 2025-05-07T19:44:27.2368119Z 2025-05-07T19:44:27.2368124Z 2025-05-07T19:44:27.2368128Z 2025-05-07T19:44:27.2368132Z 2025-05-07T19:44:27.2368136Z 2025-05-07T19:44:27.2368139Z 2025-05-07T19:44:27.2368142Z 2025-05-07T19:44:27.2380693Z gxx_linux-64-11.4.0 | 29 KB | #####5 | 55%  2025-05-07T19:44:27.2381718Z 2025-05-07T19:44:27.2381731Z 2025-05-07T19:44:27.2381742Z 2025-05-07T19:44:27.2381752Z 2025-05-07T19:44:27.2381763Z 2025-05-07T19:44:27.2381773Z 2025-05-07T19:44:27.2381783Z 2025-05-07T19:44:27.2381793Z 2025-05-07T19:44:27.2381839Z 2025-05-07T19:44:27.2381849Z 2025-05-07T19:44:27.2403347Z gxx_linux-64-11.4.0 | 29 KB | ########## | 100%  2025-05-07T19:44:27.2403753Z 2025-05-07T19:44:27.2403758Z 2025-05-07T19:44:27.2403762Z 2025-05-07T19:44:27.2403766Z 2025-05-07T19:44:27.2403770Z 2025-05-07T19:44:27.2403774Z 2025-05-07T19:44:27.2403797Z 2025-05-07T19:44:27.2403801Z 2025-05-07T19:44:27.2419383Z libstdcxx-ng-15.1.0 | 34 KB | ####7 | 47%  2025-05-07T19:44:27.2419784Z 2025-05-07T19:44:27.2419789Z 2025-05-07T19:44:27.2419793Z 2025-05-07T19:44:27.2419796Z 2025-05-07T19:44:27.2419800Z 2025-05-07T19:44:27.2419804Z 2025-05-07T19:44:27.2419808Z 2025-05-07T19:44:27.2419811Z 2025-05-07T19:44:27.2468652Z libstdcxx-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:27.2469665Z 2025-05-07T19:44:27.2899755Z gxx_impl_linux-64-11 | 11.2 MB | #########4 | 94%  2025-05-07T19:44:27.3011017Z gcc_impl_linux-64-11 | 53.0 MB | #####8 | 58% 2025-05-07T19:44:27.3012007Z 2025-05-07T19:44:27.3012011Z 2025-05-07T19:44:27.3012015Z 2025-05-07T19:44:27.3012019Z 2025-05-07T19:44:27.3012023Z 2025-05-07T19:44:27.3012028Z 2025-05-07T19:44:27.3012032Z 2025-05-07T19:44:27.3012064Z 2025-05-07T19:44:27.3012068Z 2025-05-07T19:44:27.3012072Z 2025-05-07T19:44:27.3012075Z 2025-05-07T19:44:27.3022039Z binutils_linux-64-2. | 28 KB | #####6 | 56%  2025-05-07T19:44:27.3023029Z 2025-05-07T19:44:27.3023040Z 2025-05-07T19:44:27.3023051Z 2025-05-07T19:44:27.3023094Z 2025-05-07T19:44:27.3023105Z 2025-05-07T19:44:27.3023115Z 2025-05-07T19:44:27.3023125Z 2025-05-07T19:44:27.3023136Z 2025-05-07T19:44:27.3023146Z 2025-05-07T19:44:27.3023156Z 2025-05-07T19:44:27.3023167Z 2025-05-07T19:44:27.3563957Z binutils_linux-64-2. | 28 KB | ########## | 100%  2025-05-07T19:44:27.3564333Z 2025-05-07T19:44:27.3845960Z gxx_impl_linux-64-11 | 11.2 MB | ########## | 100%  2025-05-07T19:44:27.3846563Z 2025-05-07T19:44:27.3846568Z 2025-05-07T19:44:27.3846572Z 2025-05-07T19:44:27.3846575Z 2025-05-07T19:44:27.3846579Z 2025-05-07T19:44:27.3846902Z libsanitizer-11.4.0 | 3.5 MB | ########## | 100%  2025-05-07T19:44:27.3847217Z 2025-05-07T19:44:27.3847221Z 2025-05-07T19:44:27.3847225Z 2025-05-07T19:44:27.3847353Z 2025-05-07T19:44:27.3847358Z 2025-05-07T19:44:27.3899980Z libsanitizer-11.4.0 | 3.5 MB | ########## | 100%  2025-05-07T19:44:27.4706504Z gcc_impl_linux-64-11 | 53.0 MB | #######5 | 76% 2025-05-07T19:44:27.4707326Z 2025-05-07T19:44:27.4707372Z 2025-05-07T19:44:27.4707383Z 2025-05-07T19:44:27.4707394Z 2025-05-07T19:44:27.4707405Z 2025-05-07T19:44:27.4707415Z 2025-05-07T19:44:27.4708388Z libgcc-devel_linux-6 | 2.3 MB | ########## | 100%  2025-05-07T19:44:27.4709300Z 2025-05-07T19:44:27.4709311Z 2025-05-07T19:44:27.4709322Z 2025-05-07T19:44:27.4709332Z 2025-05-07T19:44:27.4709343Z 2025-05-07T19:44:27.4709418Z 2025-05-07T19:44:27.4720431Z libgcc-devel_linux-6 | 2.3 MB | ########## | 100%  2025-05-07T19:44:27.4721427Z 2025-05-07T19:44:27.4721441Z 2025-05-07T19:44:27.4721453Z 2025-05-07T19:44:27.4945881Z binutils_impl_linux- | 6.0 MB | ########## | 100%  2025-05-07T19:44:27.4946230Z 2025-05-07T19:44:27.4946255Z 2025-05-07T19:44:27.4946259Z 2025-05-07T19:44:27.4946262Z 2025-05-07T19:44:27.4946266Z 2025-05-07T19:44:27.4946269Z 2025-05-07T19:44:27.4946272Z 2025-05-07T19:44:27.4946276Z 2025-05-07T19:44:27.4946280Z 2025-05-07T19:44:27.4946590Z gcc_linux-64-11.4.0 | 31 KB | ########## | 100%  2025-05-07T19:44:27.4946908Z 2025-05-07T19:44:27.4946911Z 2025-05-07T19:44:27.4946915Z 2025-05-07T19:44:27.4946918Z 2025-05-07T19:44:27.4946922Z 2025-05-07T19:44:27.4946925Z 2025-05-07T19:44:27.4946929Z 2025-05-07T19:44:27.4946933Z 2025-05-07T19:44:27.4946937Z 2025-05-07T19:44:27.5025138Z gcc_linux-64-11.4.0 | 31 KB | ########## | 100%  2025-05-07T19:44:27.5026144Z 2025-05-07T19:44:27.5026157Z 2025-05-07T19:44:27.5026167Z 2025-05-07T19:44:27.5026177Z 2025-05-07T19:44:27.5026187Z 2025-05-07T19:44:27.5026198Z 2025-05-07T19:44:27.5026208Z 2025-05-07T19:44:27.5027030Z ld_impl_linux-64-2.4 | 691 KB | ########## | 100%  2025-05-07T19:44:27.5027915Z 2025-05-07T19:44:27.5027926Z 2025-05-07T19:44:27.5027937Z 2025-05-07T19:44:27.5027947Z 2025-05-07T19:44:27.5027958Z 2025-05-07T19:44:27.5027969Z 2025-05-07T19:44:27.5027980Z 2025-05-07T19:44:27.5172377Z ld_impl_linux-64-2.4 | 691 KB | ########## | 100%  2025-05-07T19:44:27.5173348Z 2025-05-07T19:44:27.5173362Z 2025-05-07T19:44:27.5173373Z 2025-05-07T19:44:27.5173384Z 2025-05-07T19:44:27.5173395Z 2025-05-07T19:44:27.5173406Z 2025-05-07T19:44:27.5173416Z 2025-05-07T19:44:27.5173427Z 2025-05-07T19:44:27.5173438Z 2025-05-07T19:44:27.5173448Z 2025-05-07T19:44:27.5174281Z gxx_linux-64-11.4.0 | 29 KB | ########## | 100%  2025-05-07T19:44:27.5175212Z 2025-05-07T19:44:27.5175225Z 2025-05-07T19:44:27.5175235Z 2025-05-07T19:44:27.5175245Z 2025-05-07T19:44:27.5175256Z 2025-05-07T19:44:27.5175266Z 2025-05-07T19:44:27.5175277Z 2025-05-07T19:44:27.5175287Z 2025-05-07T19:44:27.5175298Z 2025-05-07T19:44:27.5175322Z 2025-05-07T19:44:27.5329589Z gxx_linux-64-11.4.0 | 29 KB | ########## | 100%  2025-05-07T19:44:27.5330592Z 2025-05-07T19:44:27.5330605Z 2025-05-07T19:44:27.5330616Z 2025-05-07T19:44:27.5330626Z 2025-05-07T19:44:27.5330637Z 2025-05-07T19:44:27.5330648Z 2025-05-07T19:44:27.5330658Z 2025-05-07T19:44:27.5330668Z 2025-05-07T19:44:27.5331488Z libstdcxx-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:27.5332376Z 2025-05-07T19:44:27.5332386Z 2025-05-07T19:44:27.5332397Z 2025-05-07T19:44:27.5332407Z 2025-05-07T19:44:27.5332417Z 2025-05-07T19:44:27.5332427Z 2025-05-07T19:44:27.5332864Z 2025-05-07T19:44:27.5332876Z 2025-05-07T19:44:27.5418069Z libstdcxx-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:27.5418438Z 2025-05-07T19:44:27.5418443Z 2025-05-07T19:44:27.5418447Z 2025-05-07T19:44:27.5418450Z 2025-05-07T19:44:27.5418454Z 2025-05-07T19:44:27.5418458Z 2025-05-07T19:44:27.5418462Z 2025-05-07T19:44:27.5418728Z 2025-05-07T19:44:27.5418747Z 2025-05-07T19:44:27.5418751Z 2025-05-07T19:44:27.5418783Z 2025-05-07T19:44:27.5419105Z binutils_linux-64-2. | 28 KB | ########## | 100%  2025-05-07T19:44:27.5419439Z 2025-05-07T19:44:27.5419442Z 2025-05-07T19:44:27.5419446Z 2025-05-07T19:44:27.5419450Z 2025-05-07T19:44:27.5419454Z 2025-05-07T19:44:27.5419457Z 2025-05-07T19:44:27.5419461Z 2025-05-07T19:44:27.5419464Z 2025-05-07T19:44:27.5419467Z 2025-05-07T19:44:27.5419495Z 2025-05-07T19:44:27.5419499Z 2025-05-07T19:44:27.7196826Z binutils_linux-64-2. | 28 KB | ########## | 100%  2025-05-07T19:44:27.7197331Z 2025-05-07T19:44:27.7415656Z gxx_impl_linux-64-11 | 11.2 MB | ########## | 100%  2025-05-07T19:44:27.7415972Z 2025-05-07T19:44:27.7416068Z 2025-05-07T19:44:27.7473984Z libstdcxx-devel_linu | 11.1 MB | ########## | 100%  2025-05-07T19:44:27.7475399Z gcc_impl_linux-64-11 | 53.0 MB | ########## | 100% 2025-05-07T19:44:28.2730609Z gcc_impl_linux-64-11 | 53.0 MB | ########## | 100% 2025-05-07T19:44:28.2732528Z gcc_impl_linux-64-11 | 53.0 MB | ########## | 100% 2025-05-07T19:44:28.2733615Z 2025-05-07T19:44:28.2734276Z 2025-05-07T19:44:28.2734915Z  2025-05-07T19:44:28.2735551Z 2025-05-07T19:44:28.2735563Z 2025-05-07T19:44:28.2736176Z  2025-05-07T19:44:28.2736624Z 2025-05-07T19:44:28.2736628Z 2025-05-07T19:44:28.2736632Z 2025-05-07T19:44:28.2736856Z  2025-05-07T19:44:28.2737318Z 2025-05-07T19:44:28.2737322Z 2025-05-07T19:44:28.2737326Z 2025-05-07T19:44:28.2737329Z 2025-05-07T19:44:28.2737532Z  2025-05-07T19:44:28.2737798Z 2025-05-07T19:44:28.2737802Z 2025-05-07T19:44:28.2737807Z 2025-05-07T19:44:28.2737818Z 2025-05-07T19:44:28.2737822Z 2025-05-07T19:44:28.2738042Z  2025-05-07T19:44:28.2738287Z 2025-05-07T19:44:28.2738290Z 2025-05-07T19:44:28.2738327Z 2025-05-07T19:44:28.2738331Z 2025-05-07T19:44:28.2738334Z 2025-05-07T19:44:28.2738338Z 2025-05-07T19:44:28.2738530Z  2025-05-07T19:44:28.2738770Z 2025-05-07T19:44:28.2738774Z 2025-05-07T19:44:28.2738777Z 2025-05-07T19:44:28.2738781Z 2025-05-07T19:44:28.2738784Z 2025-05-07T19:44:28.2738789Z 2025-05-07T19:44:28.2738817Z 2025-05-07T19:44:28.2739012Z  2025-05-07T19:44:28.2739260Z 2025-05-07T19:44:28.2739263Z 2025-05-07T19:44:28.2739267Z 2025-05-07T19:44:28.2739270Z 2025-05-07T19:44:28.2739273Z 2025-05-07T19:44:28.2739277Z 2025-05-07T19:44:28.2739280Z 2025-05-07T19:44:28.2739283Z 2025-05-07T19:44:28.2739513Z  2025-05-07T19:44:28.2739766Z 2025-05-07T19:44:28.2739769Z 2025-05-07T19:44:28.2739773Z 2025-05-07T19:44:28.2739776Z 2025-05-07T19:44:28.2739780Z 2025-05-07T19:44:28.2739784Z 2025-05-07T19:44:28.2739787Z 2025-05-07T19:44:28.2739791Z 2025-05-07T19:44:28.2739794Z 2025-05-07T19:44:28.2740022Z  2025-05-07T19:44:28.2740267Z 2025-05-07T19:44:28.2740270Z 2025-05-07T19:44:28.2740274Z 2025-05-07T19:44:28.2740277Z 2025-05-07T19:44:28.2740281Z 2025-05-07T19:44:28.2740284Z 2025-05-07T19:44:28.2740288Z 2025-05-07T19:44:28.2740525Z 2025-05-07T19:44:28.2740528Z 2025-05-07T19:44:28.2740532Z 2025-05-07T19:44:28.2740775Z  2025-05-07T19:44:28.2741038Z 2025-05-07T19:44:28.2741042Z 2025-05-07T19:44:28.2741046Z 2025-05-07T19:44:28.2741049Z 2025-05-07T19:44:28.2741053Z 2025-05-07T19:44:28.2741267Z 2025-05-07T19:44:28.2741272Z 2025-05-07T19:44:28.2741275Z 2025-05-07T19:44:28.2741279Z 2025-05-07T19:44:28.2741282Z 2025-05-07T19:44:28.2741286Z 2025-05-07T19:44:28.2741538Z  done 2025-05-07T19:44:28.3744861Z Preparing transaction: \ done 2025-05-07T19:44:28.6761307Z Verifying transaction: / - \ done 2025-05-07T19:44:28.7778376Z Executing transaction: / done 2025-05-07T19:44:28.8683966Z [INSTALL] Setting the C/C++ compiler symlinks ... 2025-05-07T19:44:32.5923802Z + ln -sf /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-cc /github/home/miniconda/envs/build_binary/bin/cc 2025-05-07T19:44:32.5925755Z 2025-05-07T19:44:32.5937019Z 2025-05-07T19:44:32.5955991Z + ln -sf /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-cc /github/home/miniconda/envs/build_binary/bin/gcc 2025-05-07T19:44:32.5957855Z 2025-05-07T19:44:32.5966754Z 2025-05-07T19:44:32.5985273Z + ln -sf /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ /github/home/miniconda/envs/build_binary/bin/c++ 2025-05-07T19:44:32.5987065Z 2025-05-07T19:44:32.5997315Z 2025-05-07T19:44:32.6017009Z + ln -sf /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ /github/home/miniconda/envs/build_binary/bin/g++ 2025-05-07T19:44:32.6019061Z 2025-05-07T19:44:32.6025649Z 2025-05-07T19:44:32.6033588Z [INSTALL] Installing Clang (16.0.6, 64) and relevant libraries through Conda ... 2025-05-07T19:44:32.6059263Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y clangxx=16.0.6 libcxx llvm-openmp=16.0.6 compiler-rt=16.0.6 2025-05-07T19:44:33.3141059Z Channels: 2025-05-07T19:44:33.3141764Z - conda-forge 2025-05-07T19:44:33.3142424Z Platform: linux-64 2025-05-07T19:44:36.3331463Z Collecting package metadata (repodata.json): - \ | / done 2025-05-07T19:44:37.6786672Z Solving environment: \ | / done 2025-05-07T19:44:37.7299915Z 2025-05-07T19:44:37.7300479Z ## Package Plan ## 2025-05-07T19:44:37.7300956Z 2025-05-07T19:44:37.7301673Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:44:37.7302609Z 2025-05-07T19:44:37.7302902Z added / updated specs: 2025-05-07T19:44:37.7303662Z - clangxx=16.0.6 2025-05-07T19:44:37.7304343Z - compiler-rt=16.0.6 2025-05-07T19:44:37.7305054Z - libcxx 2025-05-07T19:44:37.7305658Z - llvm-openmp=16.0.6 2025-05-07T19:44:37.7306141Z 2025-05-07T19:44:37.7306154Z 2025-05-07T19:44:37.7306505Z The following packages will be downloaded: 2025-05-07T19:44:37.7307163Z 2025-05-07T19:44:37.7307537Z package | build 2025-05-07T19:44:37.7308507Z ---------------------------|----------------- 2025-05-07T19:44:37.7309213Z clang-16.0.6 |default_h9e3a008_14 110 KB conda-forge 2025-05-07T19:44:37.7309705Z clang-16-16.0.6 |default_hb5137d0_14 780 KB conda-forge 2025-05-07T19:44:37.7310230Z clangxx-16.0.6 |default_ha78316a_14 110 KB conda-forge 2025-05-07T19:44:37.7310744Z compiler-rt-16.0.6 | h00ab1b0_2 107 KB conda-forge 2025-05-07T19:44:37.7311251Z compiler-rt_linux-64-16.0.6| h00ab1b0_2 36.0 MB conda-forge 2025-05-07T19:44:37.7311741Z icu-73.2 | h59595ed_0 11.5 MB conda-forge 2025-05-07T19:44:37.7312220Z libclang-cpp16-16.0.6 |default_hb5137d0_14 17.3 MB conda-forge 2025-05-07T19:44:37.7312739Z libcxx-19.1.7 | h2713693_1 1000 KB conda-forge 2025-05-07T19:44:37.7313199Z libcxxabi-19.1.7 | hd85fd95_1 158 KB conda-forge 2025-05-07T19:44:37.7313982Z libiconv-1.18 | h4ce23a2_1 696 KB conda-forge 2025-05-07T19:44:37.7314470Z libllvm16-16.0.6 | hb3ce162_3 33.7 MB conda-forge 2025-05-07T19:44:37.7315075Z libxml2-2.12.7 | hc051c1a_1 688 KB conda-forge 2025-05-07T19:44:37.7315694Z libzlib-1.2.13 | h4ab18f5_6 60 KB conda-forge 2025-05-07T19:44:37.7316267Z llvm-openmp-16.0.6 | h4dfa4b3_0 39.9 MB conda-forge 2025-05-07T19:44:37.7316732Z zlib-1.2.13 | h4ab18f5_6 91 KB conda-forge 2025-05-07T19:44:37.7317136Z zstd-1.5.6 | ha6fb4c9_0 542 KB conda-forge 2025-05-07T19:44:37.7317550Z ------------------------------------------------------------ 2025-05-07T19:44:37.7318113Z Total: 142.6 MB 2025-05-07T19:44:37.7318346Z 2025-05-07T19:44:37.7318490Z The following NEW packages will be INSTALLED: 2025-05-07T19:44:37.7318750Z 2025-05-07T19:44:37.7319175Z clang conda-forge/linux-64::clang-16.0.6-default_h9e3a008_14 2025-05-07T19:44:37.7319708Z clang-16 conda-forge/linux-64::clang-16-16.0.6-default_hb5137d0_14 2025-05-07T19:44:37.7320297Z clangxx conda-forge/linux-64::clangxx-16.0.6-default_ha78316a_14 2025-05-07T19:44:37.7320863Z compiler-rt conda-forge/linux-64::compiler-rt-16.0.6-h00ab1b0_2 2025-05-07T19:44:37.7321456Z compiler-rt_linux~ conda-forge/noarch::compiler-rt_linux-64-16.0.6-h00ab1b0_2 2025-05-07T19:44:37.7322021Z icu conda-forge/linux-64::icu-73.2-h59595ed_0 2025-05-07T19:44:37.7322568Z libclang-cpp16 conda-forge/linux-64::libclang-cpp16-16.0.6-default_hb5137d0_14 2025-05-07T19:44:37.7323167Z libcxx conda-forge/linux-64::libcxx-19.1.7-h2713693_1 2025-05-07T19:44:37.7323687Z libcxxabi conda-forge/linux-64::libcxxabi-19.1.7-hd85fd95_1 2025-05-07T19:44:37.7324195Z libiconv conda-forge/linux-64::libiconv-1.18-h4ce23a2_1 2025-05-07T19:44:37.7324723Z libllvm16 conda-forge/linux-64::libllvm16-16.0.6-hb3ce162_3 2025-05-07T19:44:37.7325361Z libxml2 conda-forge/linux-64::libxml2-2.12.7-hc051c1a_1 2025-05-07T19:44:37.7325856Z libzlib conda-forge/linux-64::libzlib-1.2.13-h4ab18f5_6 2025-05-07T19:44:37.7326483Z llvm-openmp conda-forge/linux-64::llvm-openmp-16.0.6-h4dfa4b3_0 2025-05-07T19:44:37.7329338Z zstd conda-forge/linux-64::zstd-1.5.6-ha6fb4c9_0 2025-05-07T19:44:37.7329675Z 2025-05-07T19:44:37.7329808Z The following packages will be UPDATED: 2025-05-07T19:44:37.7330038Z 2025-05-07T19:44:37.7330330Z zlib pkgs/main::zlib-1.2.13-h5eee18b_1 --> conda-forge::zlib-1.2.13-h4ab18f5_6 2025-05-07T19:44:37.7330705Z 2025-05-07T19:44:37.7330709Z 2025-05-07T19:44:37.7330714Z 2025-05-07T19:44:37.7330879Z Downloading and Extracting Packages: ...working... 2025-05-07T19:44:37.7331330Z llvm-openmp-16.0.6 | 39.9 MB | | 0% 2025-05-07T19:44:37.7331589Z 2025-05-07T19:44:37.7331917Z compiler-rt_linux-64 | 36.0 MB | | 0%  2025-05-07T19:44:37.7332219Z 2025-05-07T19:44:37.7332223Z 2025-05-07T19:44:37.7332465Z libllvm16-16.0.6 | 33.7 MB | | 0%  2025-05-07T19:44:37.7332742Z 2025-05-07T19:44:37.7332746Z 2025-05-07T19:44:37.7332776Z 2025-05-07T19:44:37.7333026Z libclang-cpp16-16.0. | 17.3 MB | | 0%  2025-05-07T19:44:37.7333322Z 2025-05-07T19:44:37.7333326Z 2025-05-07T19:44:37.7333330Z 2025-05-07T19:44:37.7333334Z 2025-05-07T19:44:37.7355551Z icu-73.2 | 11.5 MB | | 0%  2025-05-07T19:44:37.7356412Z 2025-05-07T19:44:37.7356426Z 2025-05-07T19:44:37.7356438Z 2025-05-07T19:44:37.7356448Z 2025-05-07T19:44:37.7356458Z 2025-05-07T19:44:37.7357160Z libcxx-19.1.7 | 1000 KB | | 0%  2025-05-07T19:44:37.7358335Z 2025-05-07T19:44:37.7358346Z 2025-05-07T19:44:37.7358357Z 2025-05-07T19:44:37.7358367Z 2025-05-07T19:44:37.7358378Z 2025-05-07T19:44:37.7358388Z 2025-05-07T19:44:37.7359104Z clang-16-16.0.6 | 780 KB | | 0%  2025-05-07T19:44:37.7359959Z 2025-05-07T19:44:37.7359970Z 2025-05-07T19:44:37.7359980Z 2025-05-07T19:44:37.7360166Z 2025-05-07T19:44:37.7360180Z 2025-05-07T19:44:37.7360190Z 2025-05-07T19:44:37.7360200Z 2025-05-07T19:44:37.7360659Z libiconv-1.18 | 696 KB | | 0%  2025-05-07T19:44:37.7360958Z 2025-05-07T19:44:37.7360993Z 2025-05-07T19:44:37.7360997Z 2025-05-07T19:44:37.7361000Z 2025-05-07T19:44:37.7361003Z 2025-05-07T19:44:37.7361007Z 2025-05-07T19:44:37.7361010Z 2025-05-07T19:44:37.7361022Z 2025-05-07T19:44:37.7365057Z libxml2-2.12.7 | 688 KB | | 0%  2025-05-07T19:44:37.7365354Z 2025-05-07T19:44:37.7365387Z 2025-05-07T19:44:37.7365390Z 2025-05-07T19:44:37.7365400Z 2025-05-07T19:44:37.7365404Z 2025-05-07T19:44:37.7365421Z 2025-05-07T19:44:37.7365425Z 2025-05-07T19:44:37.7365428Z 2025-05-07T19:44:37.7365432Z 2025-05-07T19:44:37.7365691Z zstd-1.5.6 | 542 KB | | 0%  2025-05-07T19:44:37.7366004Z 2025-05-07T19:44:37.7366007Z 2025-05-07T19:44:37.7366015Z 2025-05-07T19:44:37.7366019Z 2025-05-07T19:44:37.7366023Z 2025-05-07T19:44:37.7366026Z 2025-05-07T19:44:37.7366029Z 2025-05-07T19:44:37.7366033Z 2025-05-07T19:44:37.7366036Z 2025-05-07T19:44:37.7366040Z 2025-05-07T19:44:37.7366334Z libcxxabi-19.1.7 | 158 KB | | 0%  2025-05-07T19:44:37.7366679Z 2025-05-07T19:44:37.7366683Z 2025-05-07T19:44:37.7366686Z 2025-05-07T19:44:37.7366690Z 2025-05-07T19:44:37.7366694Z 2025-05-07T19:44:37.7366697Z 2025-05-07T19:44:37.7366701Z 2025-05-07T19:44:37.7366712Z 2025-05-07T19:44:37.7366716Z 2025-05-07T19:44:37.7366720Z 2025-05-07T19:44:37.7366728Z 2025-05-07T19:44:37.7381912Z clang-16.0.6 | 110 KB | | 0%  2025-05-07T19:44:37.7382803Z 2025-05-07T19:44:37.7382847Z 2025-05-07T19:44:37.7382858Z 2025-05-07T19:44:37.7382868Z 2025-05-07T19:44:37.7382878Z 2025-05-07T19:44:37.7382888Z 2025-05-07T19:44:37.7382898Z 2025-05-07T19:44:37.7382907Z 2025-05-07T19:44:37.7382934Z 2025-05-07T19:44:37.7382944Z 2025-05-07T19:44:37.7382955Z 2025-05-07T19:44:37.7382965Z 2025-05-07T19:44:37.7383761Z clangxx-16.0.6 | 110 KB | | 0%  2025-05-07T19:44:37.7384632Z 2025-05-07T19:44:37.7384642Z 2025-05-07T19:44:37.7384653Z 2025-05-07T19:44:37.7384663Z 2025-05-07T19:44:37.7384673Z 2025-05-07T19:44:37.7384683Z 2025-05-07T19:44:37.7384693Z 2025-05-07T19:44:37.7384703Z 2025-05-07T19:44:37.7384713Z 2025-05-07T19:44:37.7384722Z 2025-05-07T19:44:37.7384733Z 2025-05-07T19:44:37.7384743Z 2025-05-07T19:44:37.7384753Z 2025-05-07T19:44:37.7391042Z compiler-rt-16.0.6 | 107 KB | | 0%  2025-05-07T19:44:37.7391992Z 2025-05-07T19:44:37.7392003Z 2025-05-07T19:44:37.7392013Z 2025-05-07T19:44:37.7392024Z 2025-05-07T19:44:37.7392062Z 2025-05-07T19:44:37.7392073Z 2025-05-07T19:44:37.7392083Z 2025-05-07T19:44:37.7392093Z 2025-05-07T19:44:37.7392104Z 2025-05-07T19:44:37.7392158Z 2025-05-07T19:44:37.7392168Z 2025-05-07T19:44:37.7392179Z 2025-05-07T19:44:37.7392189Z 2025-05-07T19:44:37.7392199Z 2025-05-07T19:44:37.7392956Z zlib-1.2.13 | 91 KB | | 0%  2025-05-07T19:44:37.7393795Z 2025-05-07T19:44:37.7393805Z 2025-05-07T19:44:37.7393816Z 2025-05-07T19:44:37.7393826Z 2025-05-07T19:44:37.7393869Z 2025-05-07T19:44:37.7393879Z 2025-05-07T19:44:37.7393889Z 2025-05-07T19:44:37.7393899Z 2025-05-07T19:44:37.7393910Z 2025-05-07T19:44:37.7393920Z 2025-05-07T19:44:37.7393930Z 2025-05-07T19:44:37.7393940Z 2025-05-07T19:44:37.7393950Z 2025-05-07T19:44:37.7394267Z 2025-05-07T19:44:37.7394279Z 2025-05-07T19:44:37.8336650Z libzlib-1.2.13 | 60 KB | | 0%  2025-05-07T19:44:37.8337000Z 2025-05-07T19:44:37.8337005Z 2025-05-07T19:44:37.8337008Z 2025-05-07T19:44:37.8337012Z 2025-05-07T19:44:37.9517498Z icu-73.2 | 11.5 MB | 1 | 1%  2025-05-07T19:44:37.9518324Z 2025-05-07T19:44:37.9518339Z 2025-05-07T19:44:37.9518349Z 2025-05-07T19:44:37.9518360Z 2025-05-07T19:44:38.0644219Z icu-73.2 | 11.5 MB | 3 | 3%  2025-05-07T19:44:38.0645170Z 2025-05-07T19:44:38.0645175Z 2025-05-07T19:44:38.0645179Z 2025-05-07T19:44:38.0645183Z 2025-05-07T19:44:38.0914326Z icu-73.2 | 11.5 MB | 5 | 5%  2025-05-07T19:44:38.0915163Z 2025-05-07T19:44:38.0915177Z 2025-05-07T19:44:38.1123807Z libllvm16-16.0.6 | 33.7 MB | | 0%  2025-05-07T19:44:38.1441035Z llvm-openmp-16.0.6 | 39.9 MB | | 0% 2025-05-07T19:44:38.1441927Z 2025-05-07T19:44:38.1441940Z 2025-05-07T19:44:38.1441951Z 2025-05-07T19:44:38.1486734Z libclang-cpp16-16.0. | 17.3 MB | | 0%  2025-05-07T19:44:38.1487163Z 2025-05-07T19:44:38.1682850Z compiler-rt_linux-64 | 36.0 MB | | 0%  2025-05-07T19:44:38.1683209Z 2025-05-07T19:44:38.1683213Z 2025-05-07T19:44:38.1683217Z 2025-05-07T19:44:38.1683221Z 2025-05-07T19:44:38.1917156Z icu-73.2 | 11.5 MB | #####1 | 51%  2025-05-07T19:44:38.1917479Z 2025-05-07T19:44:38.1917752Z 2025-05-07T19:44:38.2126628Z libllvm16-16.0.6 | 33.7 MB | ##5 | 25%  2025-05-07T19:44:38.2443127Z llvm-openmp-16.0.6 | 39.9 MB | #7 | 17% 2025-05-07T19:44:38.2443955Z 2025-05-07T19:44:38.2443970Z 2025-05-07T19:44:38.2443981Z 2025-05-07T19:44:38.2489320Z libclang-cpp16-16.0. | 17.3 MB | ####1 | 42%  2025-05-07T19:44:38.2490242Z 2025-05-07T19:44:38.2685831Z compiler-rt_linux-64 | 36.0 MB | # | 10%  2025-05-07T19:44:38.2686712Z 2025-05-07T19:44:38.2686725Z 2025-05-07T19:44:38.2686736Z 2025-05-07T19:44:38.2686747Z 2025-05-07T19:44:38.2917957Z icu-73.2 | 11.5 MB | ########5 | 86%  2025-05-07T19:44:38.2918277Z 2025-05-07T19:44:38.2918282Z 2025-05-07T19:44:38.3126545Z libllvm16-16.0.6 | 33.7 MB | ####7 | 47%  2025-05-07T19:44:38.3443661Z llvm-openmp-16.0.6 | 39.9 MB | ### | 30% 2025-05-07T19:44:38.3444086Z 2025-05-07T19:44:38.3444375Z 2025-05-07T19:44:38.3444379Z 2025-05-07T19:44:38.3568005Z libclang-cpp16-16.0. | 17.3 MB | ######## | 80%  2025-05-07T19:44:38.3568361Z 2025-05-07T19:44:38.3918876Z compiler-rt_linux-64 | 36.0 MB | ##6 | 26%  2025-05-07T19:44:38.3919192Z 2025-05-07T19:44:38.3919425Z 2025-05-07T19:44:38.4128592Z libllvm16-16.0.6 | 33.7 MB | ######8 | 68%  2025-05-07T19:44:38.4443489Z llvm-openmp-16.0.6 | 39.9 MB | ####8 | 48% 2025-05-07T19:44:38.4445232Z 2025-05-07T19:44:38.4445246Z 2025-05-07T19:44:38.4445256Z 2025-05-07T19:44:38.4445268Z 2025-05-07T19:44:38.4570228Z icu-73.2 | 11.5 MB | ########## | 100%  2025-05-07T19:44:38.4571035Z 2025-05-07T19:44:38.4880946Z compiler-rt_linux-64 | 36.0 MB | ####1 | 41%  2025-05-07T19:44:38.4881288Z 2025-05-07T19:44:38.4881293Z 2025-05-07T19:44:38.4881297Z 2025-05-07T19:44:38.4881300Z 2025-05-07T19:44:38.4881303Z 2025-05-07T19:44:38.4919202Z libcxx-19.1.7 | 1000 KB | 1 | 2%  2025-05-07T19:44:38.4920082Z 2025-05-07T19:44:38.4920127Z 2025-05-07T19:44:38.5130270Z libllvm16-16.0.6 | 33.7 MB | #########3 | 93%  2025-05-07T19:44:38.5219347Z llvm-openmp-16.0.6 | 39.9 MB | ######4 | 64% 2025-05-07T19:44:38.5219654Z 2025-05-07T19:44:38.5219659Z 2025-05-07T19:44:38.5219663Z 2025-05-07T19:44:38.5219667Z 2025-05-07T19:44:38.5219671Z 2025-05-07T19:44:38.5570398Z libcxx-19.1.7 | 1000 KB | ########## | 100%  2025-05-07T19:44:38.5570975Z 2025-05-07T19:44:38.5712408Z compiler-rt_linux-64 | 36.0 MB | #####9 | 59%  2025-05-07T19:44:38.5713057Z 2025-05-07T19:44:38.5713172Z 2025-05-07T19:44:38.5713178Z 2025-05-07T19:44:38.5713182Z 2025-05-07T19:44:38.5713187Z 2025-05-07T19:44:38.5713197Z 2025-05-07T19:44:38.5931409Z clang-16-16.0.6 | 780 KB | 2 | 2%  2025-05-07T19:44:38.5931751Z 2025-05-07T19:44:38.5931755Z 2025-05-07T19:44:38.5931759Z 2025-05-07T19:44:38.5931783Z 2025-05-07T19:44:38.5931787Z 2025-05-07T19:44:38.5931791Z 2025-05-07T19:44:38.6129658Z clang-16-16.0.6 | 780 KB | ########## | 100%  2025-05-07T19:44:38.6274995Z llvm-openmp-16.0.6 | 39.9 MB | ########3 | 83% 2025-05-07T19:44:38.6275403Z 2025-05-07T19:44:38.6275553Z 2025-05-07T19:44:38.6275837Z 2025-05-07T19:44:38.6414527Z libclang-cpp16-16.0. | 17.3 MB | ########## | 100%  2025-05-07T19:44:38.6414982Z 2025-05-07T19:44:38.6415187Z 2025-05-07T19:44:38.6415191Z 2025-05-07T19:44:38.6415218Z 2025-05-07T19:44:38.6415241Z 2025-05-07T19:44:38.6418305Z libcxx-19.1.7 | 1000 KB | ########## | 100%  2025-05-07T19:44:38.6418651Z 2025-05-07T19:44:38.6418656Z 2025-05-07T19:44:38.6418661Z 2025-05-07T19:44:38.6418664Z 2025-05-07T19:44:38.6418668Z 2025-05-07T19:44:38.6580077Z libcxx-19.1.7 | 1000 KB | ########## | 100%  2025-05-07T19:44:38.6580414Z 2025-05-07T19:44:38.6580419Z 2025-05-07T19:44:38.6580423Z 2025-05-07T19:44:38.6580426Z 2025-05-07T19:44:38.6580430Z 2025-05-07T19:44:38.6580434Z 2025-05-07T19:44:38.6580448Z 2025-05-07T19:44:38.6819183Z libiconv-1.18 | 696 KB | 2 | 2%  2025-05-07T19:44:38.6819502Z 2025-05-07T19:44:38.6819532Z 2025-05-07T19:44:38.6819548Z 2025-05-07T19:44:38.6819552Z 2025-05-07T19:44:38.6819555Z 2025-05-07T19:44:38.6819559Z 2025-05-07T19:44:38.6819562Z 2025-05-07T19:44:38.6819566Z 2025-05-07T19:44:38.6914670Z libxml2-2.12.7 | 688 KB | 2 | 2%  2025-05-07T19:44:38.6915633Z 2025-05-07T19:44:38.6915646Z 2025-05-07T19:44:38.6915657Z 2025-05-07T19:44:38.6915668Z 2025-05-07T19:44:38.6915679Z 2025-05-07T19:44:38.6915689Z 2025-05-07T19:44:38.6915700Z 2025-05-07T19:44:38.7167546Z libiconv-1.18 | 696 KB | ########## | 100%  2025-05-07T19:44:38.7167881Z 2025-05-07T19:44:38.7320951Z compiler-rt_linux-64 | 36.0 MB | #######3 | 73%  2025-05-07T19:44:38.7740361Z llvm-openmp-16.0.6 | 39.9 MB | #########9 | 100% 2025-05-07T19:44:38.7741190Z 2025-05-07T19:44:38.7741237Z 2025-05-07T19:44:38.7741249Z 2025-05-07T19:44:38.7741261Z 2025-05-07T19:44:38.7741272Z 2025-05-07T19:44:38.7741282Z 2025-05-07T19:44:38.7741293Z 2025-05-07T19:44:38.7741304Z 2025-05-07T19:44:38.7741314Z 2025-05-07T19:44:38.7792039Z zstd-1.5.6 | 542 KB | 2 | 3%  2025-05-07T19:44:38.7792400Z 2025-05-07T19:44:38.7792421Z 2025-05-07T19:44:38.7792425Z 2025-05-07T19:44:38.7792429Z 2025-05-07T19:44:38.7792432Z 2025-05-07T19:44:38.7792436Z 2025-05-07T19:44:38.7792439Z 2025-05-07T19:44:38.7792443Z 2025-05-07T19:44:38.7931361Z libxml2-2.12.7 | 688 KB | ########## | 100%  2025-05-07T19:44:38.7931830Z 2025-05-07T19:44:38.7931952Z 2025-05-07T19:44:38.7931957Z 2025-05-07T19:44:38.7931968Z 2025-05-07T19:44:38.7932085Z 2025-05-07T19:44:38.7932094Z 2025-05-07T19:44:38.7932099Z 2025-05-07T19:44:38.7932104Z 2025-05-07T19:44:38.7932109Z 2025-05-07T19:44:38.8215276Z zstd-1.5.6 | 542 KB | ########## | 100%  2025-05-07T19:44:38.8215760Z 2025-05-07T19:44:38.8215947Z 2025-05-07T19:44:38.8215997Z 2025-05-07T19:44:38.8216003Z 2025-05-07T19:44:38.8216007Z 2025-05-07T19:44:38.8216023Z 2025-05-07T19:44:38.8216028Z 2025-05-07T19:44:38.8216033Z 2025-05-07T19:44:38.8216037Z 2025-05-07T19:44:38.8216108Z 2025-05-07T19:44:38.8252477Z libcxxabi-19.1.7 | 158 KB | # | 10%  2025-05-07T19:44:38.8253068Z 2025-05-07T19:44:38.8253072Z 2025-05-07T19:44:38.8253172Z 2025-05-07T19:44:38.8253176Z 2025-05-07T19:44:38.8253180Z 2025-05-07T19:44:38.8253183Z 2025-05-07T19:44:38.8253187Z 2025-05-07T19:44:38.8253190Z 2025-05-07T19:44:38.8253195Z 2025-05-07T19:44:38.8253317Z 2025-05-07T19:44:38.8433613Z libcxxabi-19.1.7 | 158 KB | ########## | 100%  2025-05-07T19:44:38.8433996Z 2025-05-07T19:44:38.8434001Z 2025-05-07T19:44:38.8434005Z 2025-05-07T19:44:38.8434008Z 2025-05-07T19:44:38.8434013Z 2025-05-07T19:44:38.8434016Z 2025-05-07T19:44:38.8434020Z 2025-05-07T19:44:38.8434023Z 2025-05-07T19:44:38.8434027Z 2025-05-07T19:44:38.8434030Z 2025-05-07T19:44:38.8434034Z 2025-05-07T19:44:38.8463944Z clang-16.0.6 | 110 KB | #4 | 15%  2025-05-07T19:44:38.8464914Z 2025-05-07T19:44:38.8464928Z 2025-05-07T19:44:38.8464970Z 2025-05-07T19:44:38.8464981Z 2025-05-07T19:44:38.8464991Z 2025-05-07T19:44:38.8465002Z 2025-05-07T19:44:38.8465012Z 2025-05-07T19:44:38.8465022Z 2025-05-07T19:44:38.8465033Z 2025-05-07T19:44:38.8465044Z 2025-05-07T19:44:38.8465054Z 2025-05-07T19:44:38.8640636Z clang-16.0.6 | 110 KB | ########## | 100%  2025-05-07T19:44:38.8641002Z 2025-05-07T19:44:38.8876606Z compiler-rt_linux-64 | 36.0 MB | ########5 | 86%  2025-05-07T19:44:38.8876937Z 2025-05-07T19:44:38.8877008Z 2025-05-07T19:44:38.8877013Z 2025-05-07T19:44:38.8877096Z 2025-05-07T19:44:38.8877103Z 2025-05-07T19:44:38.8877109Z 2025-05-07T19:44:38.8877114Z 2025-05-07T19:44:38.8877119Z 2025-05-07T19:44:38.8877123Z 2025-05-07T19:44:38.8877128Z 2025-05-07T19:44:38.8877206Z 2025-05-07T19:44:38.8877214Z 2025-05-07T19:44:38.8903407Z clangxx-16.0.6 | 110 KB | #4 | 15%  2025-05-07T19:44:38.8903773Z 2025-05-07T19:44:38.8903778Z 2025-05-07T19:44:38.8903800Z 2025-05-07T19:44:38.8903803Z 2025-05-07T19:44:38.8903807Z 2025-05-07T19:44:38.8903822Z 2025-05-07T19:44:38.8903825Z 2025-05-07T19:44:38.8903829Z 2025-05-07T19:44:38.8903833Z 2025-05-07T19:44:38.8903836Z 2025-05-07T19:44:38.8903840Z 2025-05-07T19:44:38.8903843Z 2025-05-07T19:44:38.9052532Z clangxx-16.0.6 | 110 KB | ########## | 100%  2025-05-07T19:44:38.9053106Z 2025-05-07T19:44:38.9053201Z 2025-05-07T19:44:38.9053206Z 2025-05-07T19:44:38.9053210Z 2025-05-07T19:44:38.9053214Z 2025-05-07T19:44:38.9053217Z 2025-05-07T19:44:38.9053220Z 2025-05-07T19:44:38.9053224Z 2025-05-07T19:44:38.9053236Z 2025-05-07T19:44:38.9053239Z 2025-05-07T19:44:38.9053243Z 2025-05-07T19:44:38.9053246Z 2025-05-07T19:44:38.9053325Z 2025-05-07T19:44:38.9077842Z compiler-rt-16.0.6 | 107 KB | #4 | 15%  2025-05-07T19:44:38.9078261Z 2025-05-07T19:44:38.9078266Z 2025-05-07T19:44:38.9078269Z 2025-05-07T19:44:38.9078295Z 2025-05-07T19:44:38.9078299Z 2025-05-07T19:44:38.9078302Z 2025-05-07T19:44:38.9078306Z 2025-05-07T19:44:38.9078309Z 2025-05-07T19:44:38.9078313Z 2025-05-07T19:44:38.9078316Z 2025-05-07T19:44:38.9078320Z 2025-05-07T19:44:38.9078323Z 2025-05-07T19:44:38.9078327Z 2025-05-07T19:44:38.9386470Z compiler-rt-16.0.6 | 107 KB | ########## | 100%  2025-05-07T19:44:38.9386834Z 2025-05-07T19:44:38.9386839Z 2025-05-07T19:44:38.9386842Z 2025-05-07T19:44:38.9386846Z 2025-05-07T19:44:38.9386849Z 2025-05-07T19:44:38.9386853Z 2025-05-07T19:44:38.9386856Z 2025-05-07T19:44:38.9386860Z 2025-05-07T19:44:38.9386863Z 2025-05-07T19:44:38.9386867Z 2025-05-07T19:44:38.9386870Z 2025-05-07T19:44:38.9386873Z 2025-05-07T19:44:38.9386900Z 2025-05-07T19:44:38.9386904Z 2025-05-07T19:44:38.9410186Z zlib-1.2.13 | 91 KB | #7 | 18%  2025-05-07T19:44:38.9410780Z 2025-05-07T19:44:38.9410843Z 2025-05-07T19:44:38.9411077Z 2025-05-07T19:44:38.9411100Z 2025-05-07T19:44:38.9411140Z 2025-05-07T19:44:38.9411145Z 2025-05-07T19:44:38.9411149Z 2025-05-07T19:44:38.9411154Z 2025-05-07T19:44:38.9411158Z 2025-05-07T19:44:38.9411163Z 2025-05-07T19:44:38.9411168Z 2025-05-07T19:44:38.9411172Z 2025-05-07T19:44:38.9411177Z 2025-05-07T19:44:38.9411182Z 2025-05-07T19:44:38.9479626Z zlib-1.2.13 | 91 KB | ########## | 100%  2025-05-07T19:44:38.9480043Z 2025-05-07T19:44:38.9480049Z 2025-05-07T19:44:38.9480054Z 2025-05-07T19:44:38.9480058Z 2025-05-07T19:44:38.9480063Z 2025-05-07T19:44:38.9480067Z 2025-05-07T19:44:38.9480072Z 2025-05-07T19:44:38.9480076Z 2025-05-07T19:44:38.9480081Z 2025-05-07T19:44:38.9480085Z 2025-05-07T19:44:38.9480090Z 2025-05-07T19:44:38.9480095Z 2025-05-07T19:44:38.9480099Z 2025-05-07T19:44:38.9480104Z 2025-05-07T19:44:38.9480108Z 2025-05-07T19:44:38.9503082Z libzlib-1.2.13 | 60 KB | ##6 | 27%  2025-05-07T19:44:38.9503453Z 2025-05-07T19:44:38.9503458Z 2025-05-07T19:44:38.9503462Z 2025-05-07T19:44:38.9503465Z 2025-05-07T19:44:38.9503469Z 2025-05-07T19:44:38.9503472Z 2025-05-07T19:44:38.9503476Z 2025-05-07T19:44:38.9503479Z 2025-05-07T19:44:38.9503483Z 2025-05-07T19:44:38.9503486Z 2025-05-07T19:44:38.9503490Z 2025-05-07T19:44:38.9503499Z 2025-05-07T19:44:38.9503524Z 2025-05-07T19:44:38.9503528Z 2025-05-07T19:44:38.9503532Z 2025-05-07T19:44:38.9725089Z libzlib-1.2.13 | 60 KB | ########## | 100%  2025-05-07T19:44:38.9725442Z 2025-05-07T19:44:38.9760853Z compiler-rt_linux-64 | 36.0 MB | #########6 | 97%  2025-05-07T19:44:38.9761152Z 2025-05-07T19:44:38.9761376Z 2025-05-07T19:44:38.9761384Z 2025-05-07T19:44:38.9761389Z 2025-05-07T19:44:38.9761435Z 2025-05-07T19:44:38.9761440Z 2025-05-07T19:44:38.9761906Z clang-16-16.0.6 | 780 KB | ########## | 100%  2025-05-07T19:44:38.9762251Z 2025-05-07T19:44:38.9762255Z 2025-05-07T19:44:38.9762259Z 2025-05-07T19:44:38.9762263Z 2025-05-07T19:44:38.9762266Z 2025-05-07T19:44:38.9762270Z 2025-05-07T19:44:39.0489854Z clang-16-16.0.6 | 780 KB | ########## | 100%  2025-05-07T19:44:39.0490507Z 2025-05-07T19:44:39.0490521Z 2025-05-07T19:44:39.0490527Z 2025-05-07T19:44:39.0490672Z 2025-05-07T19:44:39.0490680Z 2025-05-07T19:44:39.0490685Z 2025-05-07T19:44:39.0490691Z 2025-05-07T19:44:39.0491540Z libiconv-1.18 | 696 KB | ########## | 100%  2025-05-07T19:44:39.0491910Z 2025-05-07T19:44:39.0491915Z 2025-05-07T19:44:39.0491918Z 2025-05-07T19:44:39.0491923Z 2025-05-07T19:44:39.0491927Z 2025-05-07T19:44:39.0491930Z 2025-05-07T19:44:39.0491951Z 2025-05-07T19:44:39.0505427Z libiconv-1.18 | 696 KB | ########## | 100%  2025-05-07T19:44:39.0505731Z 2025-05-07T19:44:39.0507370Z 2025-05-07T19:44:39.1058017Z libllvm16-16.0.6 | 33.7 MB | ########## | 100%  2025-05-07T19:44:39.1058432Z 2025-05-07T19:44:39.1058436Z 2025-05-07T19:44:39.1058440Z 2025-05-07T19:44:39.1058443Z 2025-05-07T19:44:39.1058447Z 2025-05-07T19:44:39.1058450Z 2025-05-07T19:44:39.1058453Z 2025-05-07T19:44:39.1058457Z 2025-05-07T19:44:39.1058745Z libxml2-2.12.7 | 688 KB | ########## | 100%  2025-05-07T19:44:39.1059064Z 2025-05-07T19:44:39.1059069Z 2025-05-07T19:44:39.1059073Z 2025-05-07T19:44:39.1059076Z 2025-05-07T19:44:39.1059080Z 2025-05-07T19:44:39.1059083Z 2025-05-07T19:44:39.1059087Z 2025-05-07T19:44:39.1059100Z 2025-05-07T19:44:39.1224771Z libxml2-2.12.7 | 688 KB | ########## | 100%  2025-05-07T19:44:39.1225116Z 2025-05-07T19:44:39.1225121Z 2025-05-07T19:44:39.1225124Z 2025-05-07T19:44:39.1225128Z 2025-05-07T19:44:39.1467176Z icu-73.2 | 11.5 MB | ########## | 100%  2025-05-07T19:44:39.1467655Z 2025-05-07T19:44:39.1467660Z 2025-05-07T19:44:39.1467664Z 2025-05-07T19:44:39.1467901Z 2025-05-07T19:44:39.1467905Z 2025-05-07T19:44:39.1467910Z 2025-05-07T19:44:39.1467915Z 2025-05-07T19:44:39.1467918Z 2025-05-07T19:44:39.1467922Z 2025-05-07T19:44:39.1470648Z zstd-1.5.6 | 542 KB | ########## | 100%  2025-05-07T19:44:39.1470963Z 2025-05-07T19:44:39.1470966Z 2025-05-07T19:44:39.1471086Z 2025-05-07T19:44:39.1471090Z 2025-05-07T19:44:39.1471095Z 2025-05-07T19:44:39.1471098Z 2025-05-07T19:44:39.1471102Z 2025-05-07T19:44:39.1471105Z 2025-05-07T19:44:39.1471113Z 2025-05-07T19:44:39.1528417Z zstd-1.5.6 | 542 KB | ########## | 100%  2025-05-07T19:44:39.1528764Z 2025-05-07T19:44:39.1528769Z 2025-05-07T19:44:39.1528773Z 2025-05-07T19:44:39.1528776Z 2025-05-07T19:44:39.1528780Z 2025-05-07T19:44:39.1528783Z 2025-05-07T19:44:39.1528787Z 2025-05-07T19:44:39.1528790Z 2025-05-07T19:44:39.1528794Z 2025-05-07T19:44:39.1528797Z 2025-05-07T19:44:39.1529086Z libcxxabi-19.1.7 | 158 KB | ########## | 100%  2025-05-07T19:44:39.1529435Z 2025-05-07T19:44:39.1529438Z 2025-05-07T19:44:39.1529442Z 2025-05-07T19:44:39.1529445Z 2025-05-07T19:44:39.1529449Z 2025-05-07T19:44:39.1529456Z 2025-05-07T19:44:39.1529460Z 2025-05-07T19:44:39.1529463Z 2025-05-07T19:44:39.1529467Z 2025-05-07T19:44:39.1529470Z 2025-05-07T19:44:39.1609149Z libcxxabi-19.1.7 | 158 KB | ########## | 100%  2025-05-07T19:44:39.2474572Z llvm-openmp-16.0.6 | 39.9 MB | ########## | 100% 2025-05-07T19:44:39.2474841Z 2025-05-07T19:44:39.2474846Z 2025-05-07T19:44:39.2474850Z 2025-05-07T19:44:39.2474854Z 2025-05-07T19:44:39.2474857Z 2025-05-07T19:44:39.2474861Z 2025-05-07T19:44:39.2474864Z 2025-05-07T19:44:39.2474867Z 2025-05-07T19:44:39.2474870Z 2025-05-07T19:44:39.2474883Z 2025-05-07T19:44:39.2474886Z 2025-05-07T19:44:39.2475241Z clang-16.0.6 | 110 KB | ########## | 100%  2025-05-07T19:44:39.2475543Z 2025-05-07T19:44:39.2475557Z 2025-05-07T19:44:39.2475560Z 2025-05-07T19:44:39.2475564Z 2025-05-07T19:44:39.2475567Z 2025-05-07T19:44:39.2475570Z 2025-05-07T19:44:39.2475574Z 2025-05-07T19:44:39.2475577Z 2025-05-07T19:44:39.2475580Z 2025-05-07T19:44:39.2475584Z 2025-05-07T19:44:39.2475613Z 2025-05-07T19:44:39.2514416Z clang-16.0.6 | 110 KB | ########## | 100%  2025-05-07T19:44:39.2514745Z 2025-05-07T19:44:39.2514862Z 2025-05-07T19:44:39.2514870Z 2025-05-07T19:44:39.2514877Z 2025-05-07T19:44:39.2514881Z 2025-05-07T19:44:39.2514924Z 2025-05-07T19:44:39.2514928Z 2025-05-07T19:44:39.2514933Z 2025-05-07T19:44:39.2514937Z 2025-05-07T19:44:39.2514942Z 2025-05-07T19:44:39.2514946Z 2025-05-07T19:44:39.2514951Z 2025-05-07T19:44:39.2516133Z clangxx-16.0.6 | 110 KB | ########## | 100%  2025-05-07T19:44:39.2516504Z 2025-05-07T19:44:39.2516508Z 2025-05-07T19:44:39.2516512Z 2025-05-07T19:44:39.2516515Z 2025-05-07T19:44:39.2516534Z 2025-05-07T19:44:39.2516537Z 2025-05-07T19:44:39.2516541Z 2025-05-07T19:44:39.2516544Z 2025-05-07T19:44:39.2516548Z 2025-05-07T19:44:39.2516551Z 2025-05-07T19:44:39.2516555Z 2025-05-07T19:44:39.2516559Z 2025-05-07T19:44:39.2739252Z clangxx-16.0.6 | 110 KB | ########## | 100%  2025-05-07T19:44:39.2739641Z 2025-05-07T19:44:39.2739646Z 2025-05-07T19:44:39.2739650Z 2025-05-07T19:44:39.2739653Z 2025-05-07T19:44:39.2739657Z 2025-05-07T19:44:39.2739660Z 2025-05-07T19:44:39.2739664Z 2025-05-07T19:44:39.2739667Z 2025-05-07T19:44:39.2739671Z 2025-05-07T19:44:39.2739674Z 2025-05-07T19:44:39.2739678Z 2025-05-07T19:44:39.2739682Z 2025-05-07T19:44:39.2739685Z 2025-05-07T19:44:39.2739689Z 2025-05-07T19:44:39.2741853Z zlib-1.2.13 | 91 KB | ########## | 100%  2025-05-07T19:44:39.2742159Z 2025-05-07T19:44:39.2742163Z 2025-05-07T19:44:39.2742166Z 2025-05-07T19:44:39.2742170Z 2025-05-07T19:44:39.2742343Z 2025-05-07T19:44:39.2742347Z 2025-05-07T19:44:39.2742350Z 2025-05-07T19:44:39.2742354Z 2025-05-07T19:44:39.2742357Z 2025-05-07T19:44:39.2742361Z 2025-05-07T19:44:39.2742364Z 2025-05-07T19:44:39.2742368Z 2025-05-07T19:44:39.2742371Z 2025-05-07T19:44:39.2742465Z 2025-05-07T19:44:39.2845436Z zlib-1.2.13 | 91 KB | ########## | 100%  2025-05-07T19:44:39.2845802Z 2025-05-07T19:44:39.2845832Z 2025-05-07T19:44:39.2845836Z 2025-05-07T19:44:39.2845840Z 2025-05-07T19:44:39.2845843Z 2025-05-07T19:44:39.2845846Z 2025-05-07T19:44:39.2845850Z 2025-05-07T19:44:39.2845853Z 2025-05-07T19:44:39.2845857Z 2025-05-07T19:44:39.2845860Z 2025-05-07T19:44:39.2845864Z 2025-05-07T19:44:39.2845867Z 2025-05-07T19:44:39.2845871Z 2025-05-07T19:44:39.2846195Z compiler-rt-16.0.6 | 107 KB | ########## | 100%  2025-05-07T19:44:39.2846638Z 2025-05-07T19:44:39.2846641Z 2025-05-07T19:44:39.2846645Z 2025-05-07T19:44:39.2846655Z 2025-05-07T19:44:39.2846659Z 2025-05-07T19:44:39.2846662Z 2025-05-07T19:44:39.2846666Z 2025-05-07T19:44:39.2846669Z 2025-05-07T19:44:39.2846672Z 2025-05-07T19:44:39.2846676Z 2025-05-07T19:44:39.2846679Z 2025-05-07T19:44:39.2846683Z 2025-05-07T19:44:39.2846686Z 2025-05-07T19:44:39.2906828Z compiler-rt-16.0.6 | 107 KB | ########## | 100%  2025-05-07T19:44:39.2907220Z 2025-05-07T19:44:39.2907225Z 2025-05-07T19:44:39.2907228Z 2025-05-07T19:44:39.2907232Z 2025-05-07T19:44:39.2907235Z 2025-05-07T19:44:39.2907239Z 2025-05-07T19:44:39.2907242Z 2025-05-07T19:44:39.2907245Z 2025-05-07T19:44:39.2907249Z 2025-05-07T19:44:39.2907253Z 2025-05-07T19:44:39.2907256Z 2025-05-07T19:44:39.2907259Z 2025-05-07T19:44:39.2907263Z 2025-05-07T19:44:39.2907266Z 2025-05-07T19:44:39.2907270Z 2025-05-07T19:44:39.2909372Z libzlib-1.2.13 | 60 KB | ########## | 100%  2025-05-07T19:44:39.2909699Z 2025-05-07T19:44:39.2909711Z 2025-05-07T19:44:39.2909715Z 2025-05-07T19:44:39.2909718Z 2025-05-07T19:44:39.2909721Z 2025-05-07T19:44:39.2909725Z 2025-05-07T19:44:39.2909728Z 2025-05-07T19:44:39.2909732Z 2025-05-07T19:44:39.2909735Z 2025-05-07T19:44:39.2909760Z 2025-05-07T19:44:39.2909764Z 2025-05-07T19:44:39.2909767Z 2025-05-07T19:44:39.2909770Z 2025-05-07T19:44:39.2909778Z 2025-05-07T19:44:39.2909785Z 2025-05-07T19:44:39.3382227Z libzlib-1.2.13 | 60 KB | ########## | 100%  2025-05-07T19:44:39.3382579Z 2025-05-07T19:44:39.3565948Z compiler-rt_linux-64 | 36.0 MB | ########## | 100%  2025-05-07T19:44:39.3566252Z 2025-05-07T19:44:39.3566412Z 2025-05-07T19:44:39.3566420Z 2025-05-07T19:44:39.8446440Z libclang-cpp16-16.0. | 17.3 MB | ########## | 100%  2025-05-07T19:44:39.8446795Z 2025-05-07T19:44:39.8641891Z compiler-rt_linux-64 | 36.0 MB | ########## | 100%  2025-05-07T19:44:39.8642229Z 2025-05-07T19:44:39.8642403Z 2025-05-07T19:44:39.9006298Z libllvm16-16.0.6 | 33.7 MB | ########## | 100%  2025-05-07T19:44:39.9015225Z llvm-openmp-16.0.6 | 39.9 MB | ########## | 100% 2025-05-07T19:44:39.9015733Z 2025-05-07T19:44:39.9015998Z 2025-05-07T19:44:39.9016296Z  2025-05-07T19:44:39.9016519Z 2025-05-07T19:44:39.9016523Z 2025-05-07T19:44:39.9016723Z  2025-05-07T19:44:39.9016947Z 2025-05-07T19:44:39.9016951Z 2025-05-07T19:44:39.9016955Z 2025-05-07T19:44:39.9017239Z  2025-05-07T19:44:39.9017500Z 2025-05-07T19:44:39.9017504Z 2025-05-07T19:44:39.9017507Z 2025-05-07T19:44:39.9017511Z 2025-05-07T19:44:39.9017702Z  2025-05-07T19:44:39.9017935Z 2025-05-07T19:44:39.9017939Z 2025-05-07T19:44:39.9017942Z 2025-05-07T19:44:39.9018206Z 2025-05-07T19:44:39.9018210Z 2025-05-07T19:44:39.9018406Z  2025-05-07T19:44:39.9018638Z 2025-05-07T19:44:39.9018641Z 2025-05-07T19:44:39.9018645Z 2025-05-07T19:44:39.9018648Z 2025-05-07T19:44:39.9018652Z 2025-05-07T19:44:39.9018655Z 2025-05-07T19:44:39.9019000Z  2025-05-07T19:44:39.9019237Z 2025-05-07T19:44:39.9019241Z 2025-05-07T19:44:39.9019244Z 2025-05-07T19:44:39.9019248Z 2025-05-07T19:44:39.9019251Z 2025-05-07T19:44:39.9019254Z 2025-05-07T19:44:39.9019258Z 2025-05-07T19:44:39.9019472Z  2025-05-07T19:44:39.9019707Z 2025-05-07T19:44:39.9019710Z 2025-05-07T19:44:39.9019714Z 2025-05-07T19:44:39.9019717Z 2025-05-07T19:44:39.9019721Z 2025-05-07T19:44:39.9019724Z 2025-05-07T19:44:39.9019728Z 2025-05-07T19:44:39.9019731Z 2025-05-07T19:44:39.9019922Z  2025-05-07T19:44:39.9020186Z 2025-05-07T19:44:39.9020189Z 2025-05-07T19:44:39.9020193Z 2025-05-07T19:44:39.9020196Z 2025-05-07T19:44:39.9020200Z 2025-05-07T19:44:39.9020203Z 2025-05-07T19:44:39.9020207Z 2025-05-07T19:44:39.9020210Z 2025-05-07T19:44:39.9020213Z 2025-05-07T19:44:39.9020475Z  2025-05-07T19:44:39.9020740Z 2025-05-07T19:44:39.9020744Z 2025-05-07T19:44:39.9020747Z 2025-05-07T19:44:39.9020751Z 2025-05-07T19:44:39.9020754Z 2025-05-07T19:44:39.9020757Z 2025-05-07T19:44:39.9020761Z 2025-05-07T19:44:39.9020764Z 2025-05-07T19:44:39.9020767Z 2025-05-07T19:44:39.9020771Z 2025-05-07T19:44:39.9020972Z  2025-05-07T19:44:39.9021240Z 2025-05-07T19:44:39.9021244Z 2025-05-07T19:44:39.9021247Z 2025-05-07T19:44:39.9021251Z 2025-05-07T19:44:39.9021254Z 2025-05-07T19:44:39.9021262Z 2025-05-07T19:44:39.9021265Z 2025-05-07T19:44:39.9021269Z 2025-05-07T19:44:39.9021272Z 2025-05-07T19:44:39.9021276Z 2025-05-07T19:44:39.9021279Z 2025-05-07T19:44:39.9021486Z  2025-05-07T19:44:39.9021755Z 2025-05-07T19:44:39.9021759Z 2025-05-07T19:44:39.9021762Z 2025-05-07T19:44:39.9021770Z 2025-05-07T19:44:39.9021774Z 2025-05-07T19:44:39.9021777Z 2025-05-07T19:44:39.9021781Z 2025-05-07T19:44:39.9021784Z 2025-05-07T19:44:39.9021787Z 2025-05-07T19:44:39.9021791Z 2025-05-07T19:44:39.9021794Z 2025-05-07T19:44:39.9021797Z 2025-05-07T19:44:39.9022028Z  2025-05-07T19:44:39.9022277Z 2025-05-07T19:44:39.9022281Z 2025-05-07T19:44:39.9022285Z 2025-05-07T19:44:39.9022288Z 2025-05-07T19:44:39.9022292Z 2025-05-07T19:44:39.9022295Z 2025-05-07T19:44:39.9022298Z 2025-05-07T19:44:39.9022302Z 2025-05-07T19:44:39.9022305Z 2025-05-07T19:44:39.9022313Z 2025-05-07T19:44:39.9022316Z 2025-05-07T19:44:39.9022320Z 2025-05-07T19:44:39.9022323Z 2025-05-07T19:44:39.9022559Z  2025-05-07T19:44:39.9022813Z 2025-05-07T19:44:39.9022816Z 2025-05-07T19:44:39.9022820Z 2025-05-07T19:44:39.9022823Z 2025-05-07T19:44:39.9022831Z 2025-05-07T19:44:39.9022835Z 2025-05-07T19:44:39.9022838Z 2025-05-07T19:44:39.9022841Z 2025-05-07T19:44:39.9022845Z 2025-05-07T19:44:39.9022848Z 2025-05-07T19:44:39.9022851Z 2025-05-07T19:44:39.9022855Z 2025-05-07T19:44:39.9022877Z 2025-05-07T19:44:39.9022881Z 2025-05-07T19:44:39.9023099Z  2025-05-07T19:44:39.9023353Z 2025-05-07T19:44:39.9023356Z 2025-05-07T19:44:39.9023360Z 2025-05-07T19:44:39.9023363Z 2025-05-07T19:44:39.9023367Z 2025-05-07T19:44:39.9023370Z 2025-05-07T19:44:39.9023374Z 2025-05-07T19:44:39.9023377Z 2025-05-07T19:44:39.9023470Z 2025-05-07T19:44:39.9023474Z 2025-05-07T19:44:39.9023477Z 2025-05-07T19:44:39.9023481Z 2025-05-07T19:44:39.9023484Z 2025-05-07T19:44:39.9023488Z 2025-05-07T19:44:39.9023491Z 2025-05-07T19:44:39.9023735Z  done 2025-05-07T19:44:40.0028954Z Preparing transaction: \ done 2025-05-07T19:44:40.1036741Z Verifying transaction: / done 2025-05-07T19:44:40.2054853Z Executing transaction: \ done 2025-05-07T19:44:40.3206653Z [INSTALL] Setting the C/C++ compiler symlinks ... 2025-05-07T19:44:43.9926468Z + ln -sf /github/home/miniconda/envs/build_binary/bin/clang /github/home/miniconda/envs/build_binary/bin/cc 2025-05-07T19:44:43.9926989Z 2025-05-07T19:44:43.9938681Z 2025-05-07T19:44:43.9956267Z + ln -sf /github/home/miniconda/envs/build_binary/bin/clang /github/home/miniconda/envs/build_binary/bin/gcc 2025-05-07T19:44:43.9956849Z 2025-05-07T19:44:43.9977875Z 2025-05-07T19:44:44.0003304Z + ln -sf /github/home/miniconda/envs/build_binary/bin/clang++ /github/home/miniconda/envs/build_binary/bin/c++ 2025-05-07T19:44:44.0004001Z 2025-05-07T19:44:44.0019445Z 2025-05-07T19:44:44.0040758Z + ln -sf /github/home/miniconda/envs/build_binary/bin/clang++ /github/home/miniconda/envs/build_binary/bin/g++ 2025-05-07T19:44:44.0042305Z 2025-05-07T19:44:44.0054827Z 2025-05-07T19:44:44.0055602Z + conda env config vars set -n build_binary CC= 2025-05-07T19:44:44.0056356Z 2025-05-07T19:44:44.4172279Z 2025-05-07T19:44:44.4173182Z + conda env config vars set -n build_binary CXX= 2025-05-07T19:44:44.4173958Z 2025-05-07T19:44:44.8337613Z 2025-05-07T19:44:44.8338606Z + conda run -n build_binary printenv CC 2025-05-07T19:44:44.8339360Z 2025-05-07T19:44:46.6216535Z /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-cc 2025-05-07T19:44:46.6218038Z 2025-05-07T19:44:46.6943229Z 2025-05-07T19:44:46.6944890Z + conda run -n build_binary printenv CXX 2025-05-07T19:44:46.6945591Z 2025-05-07T19:44:48.4569349Z /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ 2025-05-07T19:44:48.4569754Z 2025-05-07T19:44:48.5319046Z 2025-05-07T19:44:50.3939911Z [ENV] Appending to LD_LIBRARY_PATH: /github/home/miniconda/envs/build_binary/lib ... 2025-05-07T19:44:52.1745257Z ERROR conda.cli.main_run:execute(125): `conda run printenv LD_LIBRARY_PATH` failed. (See above for error) 2025-05-07T19:44:52.2364345Z + conda env config vars set -n build_binary LD_LIBRARY_PATH=/github/home/miniconda/envs/build_binary/lib 2025-05-07T19:44:52.2364843Z 2025-05-07T19:44:52.6598901Z 2025-05-07T19:44:54.4659500Z /github/home/miniconda/envs/build_binary/bin/cc 2025-05-07T19:44:54.4660397Z 2025-05-07T19:44:54.5393155Z [CHECK] Binary cc found in PATH 2025-05-07T19:44:56.3485719Z /github/home/miniconda/envs/build_binary/bin/gcc 2025-05-07T19:44:56.3486494Z 2025-05-07T19:44:56.4211380Z [CHECK] Binary gcc found in PATH 2025-05-07T19:44:58.2362410Z /github/home/miniconda/envs/build_binary/bin/c++ 2025-05-07T19:44:58.2362951Z 2025-05-07T19:44:58.3105940Z [CHECK] Binary c++ found in PATH 2025-05-07T19:45:00.1219878Z /github/home/miniconda/envs/build_binary/bin/g++ 2025-05-07T19:45:00.1220268Z 2025-05-07T19:45:00.1803841Z [CHECK] Binary g++ found in PATH 2025-05-07T19:45:00.1804388Z [INFO] Printing out all preprocessor defines in the C compiler ... 2025-05-07T19:45:00.1804912Z + conda run -n build_binary cc -dM -E - 2025-05-07T19:45:00.1805133Z 2025-05-07T19:45:02.0373770Z #define _LP64 1 2025-05-07T19:45:02.0374606Z #define __ATOMIC_ACQUIRE 2 2025-05-07T19:45:02.0375213Z #define __ATOMIC_ACQ_REL 4 2025-05-07T19:45:02.0375531Z #define __ATOMIC_CONSUME 1 2025-05-07T19:45:02.0375815Z #define __ATOMIC_RELAXED 0 2025-05-07T19:45:02.0376158Z #define __ATOMIC_RELEASE 3 2025-05-07T19:45:02.0376451Z #define __ATOMIC_SEQ_CST 5 2025-05-07T19:45:02.0376774Z #define __BIGGEST_ALIGNMENT__ 16 2025-05-07T19:45:02.0377257Z #define __BITINT_MAXWIDTH__ 8388608 2025-05-07T19:45:02.0377798Z #define __BOOL_WIDTH__ 8 2025-05-07T19:45:02.0378578Z #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__ 2025-05-07T19:45:02.0378946Z #define __CHAR16_TYPE__ unsigned short 2025-05-07T19:45:02.0379300Z #define __CHAR32_TYPE__ unsigned int 2025-05-07T19:45:02.0379603Z #define __CHAR_BIT__ 8 2025-05-07T19:45:02.0379906Z #define __CLANG_ATOMIC_BOOL_LOCK_FREE 2 2025-05-07T19:45:02.0380428Z #define __CLANG_ATOMIC_CHAR16_T_LOCK_FREE 2 2025-05-07T19:45:02.0380817Z #define __CLANG_ATOMIC_CHAR32_T_LOCK_FREE 2 2025-05-07T19:45:02.0381160Z #define __CLANG_ATOMIC_CHAR_LOCK_FREE 2 2025-05-07T19:45:02.0381522Z #define __CLANG_ATOMIC_INT_LOCK_FREE 2 2025-05-07T19:45:02.0381884Z #define __CLANG_ATOMIC_LLONG_LOCK_FREE 2 2025-05-07T19:45:02.0382230Z #define __CLANG_ATOMIC_LONG_LOCK_FREE 2 2025-05-07T19:45:02.0382600Z #define __CLANG_ATOMIC_POINTER_LOCK_FREE 2 2025-05-07T19:45:02.0382949Z #define __CLANG_ATOMIC_SHORT_LOCK_FREE 2 2025-05-07T19:45:02.0383318Z #define __CLANG_ATOMIC_WCHAR_T_LOCK_FREE 2 2025-05-07T19:45:02.0383649Z #define __CONSTANT_CFSTRINGS__ 1 2025-05-07T19:45:02.0384108Z #define __DBL_DECIMAL_DIG__ 17 2025-05-07T19:45:02.0384425Z #define __DBL_DENORM_MIN__ 4.9406564584124654e-324 2025-05-07T19:45:02.0384789Z #define __DBL_DIG__ 15 2025-05-07T19:45:02.0385065Z #define __DBL_EPSILON__ 2.2204460492503131e-16 2025-05-07T19:45:02.0385418Z #define __DBL_HAS_DENORM__ 1 2025-05-07T19:45:02.0385734Z #define __DBL_HAS_INFINITY__ 1 2025-05-07T19:45:02.0386019Z #define __DBL_HAS_QUIET_NAN__ 1 2025-05-07T19:45:02.0386327Z #define __DBL_MANT_DIG__ 53 2025-05-07T19:45:02.0386600Z #define __DBL_MAX_10_EXP__ 308 2025-05-07T19:45:02.0386907Z #define __DBL_MAX_EXP__ 1024 2025-05-07T19:45:02.0387193Z #define __DBL_MAX__ 1.7976931348623157e+308 2025-05-07T19:45:02.0387543Z #define __DBL_MIN_10_EXP__ (-307) 2025-05-07T19:45:02.0387840Z #define __DBL_MIN_EXP__ (-1021) 2025-05-07T19:45:02.0388156Z #define __DBL_MIN__ 2.2250738585072014e-308 2025-05-07T19:45:02.0388518Z #define __DECIMAL_DIG__ __LDBL_DECIMAL_DIG__ 2025-05-07T19:45:02.0388835Z #define __ELF__ 1 2025-05-07T19:45:02.0389104Z #define __FINITE_MATH_ONLY__ 0 2025-05-07T19:45:02.0389390Z #define __FLOAT128__ 1 2025-05-07T19:45:02.0389670Z #define __FLT16_DECIMAL_DIG__ 5 2025-05-07T19:45:02.0389989Z #define __FLT16_DENORM_MIN__ 5.9604644775390625e-8F16 2025-05-07T19:45:02.0390354Z #define __FLT16_DIG__ 3 2025-05-07T19:45:02.0390628Z #define __FLT16_EPSILON__ 9.765625e-4F16 2025-05-07T19:45:02.0390969Z #define __FLT16_HAS_DENORM__ 1 2025-05-07T19:45:02.0391254Z #define __FLT16_HAS_INFINITY__ 1 2025-05-07T19:45:02.0391567Z #define __FLT16_HAS_QUIET_NAN__ 1 2025-05-07T19:45:02.0391885Z #define __FLT16_MANT_DIG__ 11 2025-05-07T19:45:02.0392276Z #define __FLT16_MAX_10_EXP__ 4 2025-05-07T19:45:02.0392568Z #define __FLT16_MAX_EXP__ 16 2025-05-07T19:45:02.0392835Z #define __FLT16_MAX__ 6.5504e+4F16 2025-05-07T19:45:02.0393143Z #define __FLT16_MIN_10_EXP__ (-4) 2025-05-07T19:45:02.0393426Z #define __FLT16_MIN_EXP__ (-13) 2025-05-07T19:45:02.0393727Z #define __FLT16_MIN__ 6.103515625e-5F16 2025-05-07T19:45:02.0394268Z #define __FLT_DECIMAL_DIG__ 9 2025-05-07T19:45:02.0394749Z #define __FLT_DENORM_MIN__ 1.40129846e-45F 2025-05-07T19:45:02.0395056Z #define __FLT_DIG__ 6 2025-05-07T19:45:02.0395340Z #define __FLT_EPSILON__ 1.19209290e-7F 2025-05-07T19:45:02.0395674Z #define __FLT_HAS_DENORM__ 1 2025-05-07T19:45:02.0395951Z #define __FLT_HAS_INFINITY__ 1 2025-05-07T19:45:02.0396267Z #define __FLT_HAS_QUIET_NAN__ 1 2025-05-07T19:45:02.0396547Z #define __FLT_MANT_DIG__ 24 2025-05-07T19:45:02.0396845Z #define __FLT_MAX_10_EXP__ 38 2025-05-07T19:45:02.0397127Z #define __FLT_MAX_EXP__ 128 2025-05-07T19:45:02.0397425Z #define __FLT_MAX__ 3.40282347e+38F 2025-05-07T19:45:02.0397722Z #define __FLT_MIN_10_EXP__ (-37) 2025-05-07T19:45:02.0398043Z #define __FLT_MIN_EXP__ (-125) 2025-05-07T19:45:02.0398330Z #define __FLT_MIN__ 1.17549435e-38F 2025-05-07T19:45:02.0398828Z #define __FLT_RADIX__ 2 2025-05-07T19:45:02.0399116Z #define __FXSR__ 1 2025-05-07T19:45:02.0399370Z #define __GCC_ASM_FLAG_OUTPUTS__ 1 2025-05-07T19:45:02.0399707Z #define __GCC_ATOMIC_BOOL_LOCK_FREE 2 2025-05-07T19:45:02.0400163Z #define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 2 2025-05-07T19:45:02.0400507Z #define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2 2025-05-07T19:45:02.0400824Z #define __GCC_ATOMIC_CHAR_LOCK_FREE 2 2025-05-07T19:45:02.0401144Z #define __GCC_ATOMIC_INT_LOCK_FREE 2 2025-05-07T19:45:02.0401447Z #define __GCC_ATOMIC_LLONG_LOCK_FREE 2 2025-05-07T19:45:02.0401870Z #define __GCC_ATOMIC_LONG_LOCK_FREE 2 2025-05-07T19:45:02.0402219Z #define __GCC_ATOMIC_POINTER_LOCK_FREE 2 2025-05-07T19:45:02.0402556Z #define __GCC_ATOMIC_SHORT_LOCK_FREE 2 2025-05-07T19:45:02.0402923Z #define __GCC_ATOMIC_TEST_AND_SET_TRUEVAL 1 2025-05-07T19:45:02.0403266Z #define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2 2025-05-07T19:45:02.0403638Z #define __GCC_HAVE_DWARF2_CFI_ASM 1 2025-05-07T19:45:02.0403980Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1 2025-05-07T19:45:02.0404372Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1 2025-05-07T19:45:02.0404730Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1 2025-05-07T19:45:02.0405114Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1 2025-05-07T19:45:02.0405447Z #define __GNUC_MINOR__ 2 2025-05-07T19:45:02.0405760Z #define __GNUC_PATCHLEVEL__ 1 2025-05-07T19:45:02.0406084Z #define __GNUC_STDC_INLINE__ 1 2025-05-07T19:45:02.0406365Z #define __GNUC__ 4 2025-05-07T19:45:02.0406646Z #define __GXX_ABI_VERSION 1002 2025-05-07T19:45:02.0406942Z #define __INT16_C_SUFFIX__ 2025-05-07T19:45:02.0407245Z #define __INT16_FMTd__ "hd" 2025-05-07T19:45:02.0407505Z #define __INT16_FMTi__ "hi" 2025-05-07T19:45:02.0407786Z #define __INT16_MAX__ 32767 2025-05-07T19:45:02.0408052Z #define __INT16_TYPE__ short 2025-05-07T19:45:02.0408355Z #define __INT32_C_SUFFIX__ 2025-05-07T19:45:02.0408618Z #define __INT32_FMTd__ "d" 2025-05-07T19:45:02.0408897Z #define __INT32_FMTi__ "i" 2025-05-07T19:45:02.0409293Z #define __INT32_MAX__ 2147483647 2025-05-07T19:45:02.0409560Z #define __INT32_TYPE__ int 2025-05-07T19:45:02.0409836Z #define __INT64_C_SUFFIX__ L 2025-05-07T19:45:02.0410205Z #define __INT64_FMTd__ "ld" 2025-05-07T19:45:02.0410467Z #define __INT64_FMTi__ "li" 2025-05-07T19:45:02.0410718Z #define __INT64_MAX__ 9223372036854775807L 2025-05-07T19:45:02.0411021Z #define __INT64_TYPE__ long int 2025-05-07T19:45:02.0411281Z #define __INT8_C_SUFFIX__ 2025-05-07T19:45:02.0411550Z #define __INT8_FMTd__ "hhd" 2025-05-07T19:45:02.0411804Z #define __INT8_FMTi__ "hhi" 2025-05-07T19:45:02.0412089Z #define __INT8_MAX__ 127 2025-05-07T19:45:02.0412379Z #define __INT8_TYPE__ signed char 2025-05-07T19:45:02.0412664Z #define __INTMAX_C_SUFFIX__ L 2025-05-07T19:45:02.0412956Z #define __INTMAX_FMTd__ "ld" 2025-05-07T19:45:02.0413221Z #define __INTMAX_FMTi__ "li" 2025-05-07T19:45:02.0413522Z #define __INTMAX_MAX__ 9223372036854775807L 2025-05-07T19:45:02.0413827Z #define __INTMAX_TYPE__ long int 2025-05-07T19:45:02.0414134Z #define __INTMAX_WIDTH__ 64 2025-05-07T19:45:02.0414402Z #define __INTPTR_FMTd__ "ld" 2025-05-07T19:45:02.0414706Z #define __INTPTR_FMTi__ "li" 2025-05-07T19:45:02.0414976Z #define __INTPTR_MAX__ 9223372036854775807L 2025-05-07T19:45:02.0415286Z #define __INTPTR_TYPE__ long int 2025-05-07T19:45:02.0415563Z #define __INTPTR_WIDTH__ 64 2025-05-07T19:45:02.0415809Z #define __INT_FAST16_FMTd__ "hd" 2025-05-07T19:45:02.0416084Z #define __INT_FAST16_FMTi__ "hi" 2025-05-07T19:45:02.0416342Z #define __INT_FAST16_MAX__ 32767 2025-05-07T19:45:02.0416616Z #define __INT_FAST16_TYPE__ short 2025-05-07T19:45:02.0416887Z #define __INT_FAST16_WIDTH__ 16 2025-05-07T19:45:02.0417276Z #define __INT_FAST32_FMTd__ "d" 2025-05-07T19:45:02.0417725Z #define __INT_FAST32_FMTi__ "i" 2025-05-07T19:45:02.0418027Z #define __INT_FAST32_MAX__ 2147483647 2025-05-07T19:45:02.0418331Z #define __INT_FAST32_TYPE__ int 2025-05-07T19:45:02.0418628Z #define __INT_FAST32_WIDTH__ 32 2025-05-07T19:45:02.0418927Z #define __INT_FAST64_FMTd__ "ld" 2025-05-07T19:45:02.0419211Z #define __INT_FAST64_FMTi__ "li" 2025-05-07T19:45:02.0419537Z #define __INT_FAST64_MAX__ 9223372036854775807L 2025-05-07T19:45:02.0419871Z #define __INT_FAST64_TYPE__ long int 2025-05-07T19:45:02.0420197Z #define __INT_FAST64_WIDTH__ 64 2025-05-07T19:45:02.0420636Z #define __INT_FAST8_FMTd__ "hhd" 2025-05-07T19:45:02.0420966Z #define __INT_FAST8_FMTi__ "hhi" 2025-05-07T19:45:02.0421258Z #define __INT_FAST8_MAX__ 127 2025-05-07T19:45:02.0421588Z #define __INT_FAST8_TYPE__ signed char 2025-05-07T19:45:02.0421908Z #define __INT_FAST8_WIDTH__ 8 2025-05-07T19:45:02.0422317Z #define __INT_LEAST16_FMTd__ "hd" 2025-05-07T19:45:02.0422642Z #define __INT_LEAST16_FMTi__ "hi" 2025-05-07T19:45:02.0422942Z #define __INT_LEAST16_MAX__ 32767 2025-05-07T19:45:02.0423269Z #define __INT_LEAST16_TYPE__ short 2025-05-07T19:45:02.0423573Z #define __INT_LEAST16_WIDTH__ 16 2025-05-07T19:45:02.0423897Z #define __INT_LEAST32_FMTd__ "d" 2025-05-07T19:45:02.0424192Z #define __INT_LEAST32_FMTi__ "i" 2025-05-07T19:45:02.0424513Z #define __INT_LEAST32_MAX__ 2147483647 2025-05-07T19:45:02.0424830Z #define __INT_LEAST32_TYPE__ int 2025-05-07T19:45:02.0425144Z #define __INT_LEAST32_WIDTH__ 32 2025-05-07T19:45:02.0425437Z #define __INT_LEAST64_FMTd__ "ld" 2025-05-07T19:45:02.0425767Z #define __INT_LEAST64_FMTi__ "li" 2025-05-07T19:45:02.0426118Z #define __INT_LEAST64_MAX__ 9223372036854775807L 2025-05-07T19:45:02.0426468Z #define __INT_LEAST64_TYPE__ long int 2025-05-07T19:45:02.0426804Z #define __INT_LEAST64_WIDTH__ 64 2025-05-07T19:45:02.0427098Z #define __INT_LEAST8_FMTd__ "hhd" 2025-05-07T19:45:02.0427424Z #define __INT_LEAST8_FMTi__ "hhi" 2025-05-07T19:45:02.0427723Z #define __INT_LEAST8_MAX__ 127 2025-05-07T19:45:02.0428041Z #define __INT_LEAST8_TYPE__ signed char 2025-05-07T19:45:02.0428363Z #define __INT_LEAST8_WIDTH__ 8 2025-05-07T19:45:02.0428671Z #define __INT_MAX__ 2147483647 2025-05-07T19:45:02.0428953Z #define __INT_WIDTH__ 32 2025-05-07T19:45:02.0429259Z #define __LDBL_DECIMAL_DIG__ 21 2025-05-07T19:45:02.0429630Z #define __LDBL_DENORM_MIN__ 3.64519953188247460253e-4951L 2025-05-07T19:45:02.0430001Z #define __LDBL_DIG__ 18 2025-05-07T19:45:02.0430336Z #define __LDBL_EPSILON__ 1.08420217248550443401e-19L 2025-05-07T19:45:02.0430695Z #define __LDBL_HAS_DENORM__ 1 2025-05-07T19:45:02.0431030Z #define __LDBL_HAS_INFINITY__ 1 2025-05-07T19:45:02.0431326Z #define __LDBL_HAS_QUIET_NAN__ 1 2025-05-07T19:45:02.0431653Z #define __LDBL_MANT_DIG__ 64 2025-05-07T19:45:02.0431947Z #define __LDBL_MAX_10_EXP__ 4932 2025-05-07T19:45:02.0432268Z #define __LDBL_MAX_EXP__ 16384 2025-05-07T19:45:02.0432610Z #define __LDBL_MAX__ 1.18973149535723176502e+4932L 2025-05-07T19:45:02.0432961Z #define __LDBL_MIN_10_EXP__ (-4931) 2025-05-07T19:45:02.0433299Z #define __LDBL_MIN_EXP__ (-16381) 2025-05-07T19:45:02.0433653Z #define __LDBL_MIN__ 3.36210314311209350626e-4932L 2025-05-07T19:45:02.0434027Z #define __LITTLE_ENDIAN__ 1 2025-05-07T19:45:02.0434315Z #define __LLONG_WIDTH__ 64 2025-05-07T19:45:02.0434648Z #define __LONG_LONG_MAX__ 9223372036854775807LL 2025-05-07T19:45:02.0435002Z #define __LONG_MAX__ 9223372036854775807L 2025-05-07T19:45:02.0435353Z #define __LONG_WIDTH__ 64 2025-05-07T19:45:02.0435621Z #define __LP64__ 1 2025-05-07T19:45:02.0435897Z #define __MMX__ 1 2025-05-07T19:45:02.0436178Z #define __NO_INLINE__ 1 2025-05-07T19:45:02.0436451Z #define __NO_MATH_INLINES 1 2025-05-07T19:45:02.0436772Z #define __OBJC_BOOL_IS_BOOL 0 2025-05-07T19:45:02.0437103Z #define __OPENCL_MEMORY_SCOPE_ALL_SVM_DEVICES 3 2025-05-07T19:45:02.0437504Z #define __OPENCL_MEMORY_SCOPE_DEVICE 2 2025-05-07T19:45:02.0437850Z #define __OPENCL_MEMORY_SCOPE_SUB_GROUP 4 2025-05-07T19:45:02.0438245Z #define __OPENCL_MEMORY_SCOPE_WORK_GROUP 1 2025-05-07T19:45:02.0438601Z #define __OPENCL_MEMORY_SCOPE_WORK_ITEM 0 2025-05-07T19:45:02.0439083Z #define __ORDER_BIG_ENDIAN__ 4321 2025-05-07T19:45:02.0439391Z #define __ORDER_LITTLE_ENDIAN__ 1234 2025-05-07T19:45:02.0439735Z #define __ORDER_PDP_ENDIAN__ 3412 2025-05-07T19:45:02.0440050Z #define __PIC__ 2 2025-05-07T19:45:02.0440282Z #define __PIE__ 2 2025-05-07T19:45:02.0440557Z #define __POINTER_WIDTH__ 64 2025-05-07T19:45:02.0440852Z #define __PRAGMA_REDEFINE_EXTNAME 1 2025-05-07T19:45:02.0441188Z #define __PTRDIFF_FMTd__ "ld" 2025-05-07T19:45:02.0441580Z #define __PTRDIFF_FMTi__ "li" 2025-05-07T19:45:02.0441907Z #define __PTRDIFF_MAX__ 9223372036854775807L 2025-05-07T19:45:02.0442230Z #define __PTRDIFF_TYPE__ long int 2025-05-07T19:45:02.0442560Z #define __PTRDIFF_WIDTH__ 64 2025-05-07T19:45:02.0442840Z #define __REGISTER_PREFIX__ 2025-05-07T19:45:02.0443142Z #define __SCHAR_MAX__ 127 2025-05-07T19:45:02.0443530Z #define __SEG_FS 1 2025-05-07T19:45:02.0443762Z #define __SEG_GS 1 2025-05-07T19:45:02.0444026Z #define __SHRT_MAX__ 32767 2025-05-07T19:45:02.0444296Z #define __SHRT_WIDTH__ 16 2025-05-07T19:45:02.0444600Z #define __SIG_ATOMIC_MAX__ 2147483647 2025-05-07T19:45:02.0444907Z #define __SIG_ATOMIC_WIDTH__ 32 2025-05-07T19:45:02.0445219Z #define __SIZEOF_DOUBLE__ 8 2025-05-07T19:45:02.0445501Z #define __SIZEOF_FLOAT128__ 16 2025-05-07T19:45:02.0445806Z #define __SIZEOF_FLOAT__ 4 2025-05-07T19:45:02.0446082Z #define __SIZEOF_INT128__ 16 2025-05-07T19:45:02.0446392Z #define __SIZEOF_INT__ 4 2025-05-07T19:45:02.0446672Z #define __SIZEOF_LONG_DOUBLE__ 16 2025-05-07T19:45:02.0446957Z #define __SIZEOF_LONG_LONG__ 8 2025-05-07T19:45:02.0447237Z #define __SIZEOF_LONG__ 8 2025-05-07T19:45:02.0447487Z #define __SIZEOF_POINTER__ 8 2025-05-07T19:45:02.0447766Z #define __SIZEOF_PTRDIFF_T__ 8 2025-05-07T19:45:02.0448028Z #define __SIZEOF_SHORT__ 2 2025-05-07T19:45:02.0448296Z #define __SIZEOF_SIZE_T__ 8 2025-05-07T19:45:02.0448560Z #define __SIZEOF_WCHAR_T__ 4 2025-05-07T19:45:02.0448854Z #define __SIZEOF_WINT_T__ 4 2025-05-07T19:45:02.0449127Z #define __SIZE_FMTX__ "lX" 2025-05-07T19:45:02.0449420Z #define __SIZE_FMTo__ "lo" 2025-05-07T19:45:02.0449690Z #define __SIZE_FMTu__ "lu" 2025-05-07T19:45:02.0449990Z #define __SIZE_FMTx__ "lx" 2025-05-07T19:45:02.0450296Z #define __SIZE_MAX__ 18446744073709551615UL 2025-05-07T19:45:02.0450622Z #define __SIZE_TYPE__ long unsigned int 2025-05-07T19:45:02.0450951Z #define __SIZE_WIDTH__ 64 2025-05-07T19:45:02.0451201Z #define __SSE2_MATH__ 1 2025-05-07T19:45:02.0451472Z #define __SSE2__ 1 2025-05-07T19:45:02.0451706Z #define __SSE_MATH__ 1 2025-05-07T19:45:02.0451975Z #define __SSE__ 1 2025-05-07T19:45:02.0452212Z #define __STDC_HOSTED__ 1 2025-05-07T19:45:02.0452500Z #define __STDC_UTF_16__ 1 2025-05-07T19:45:02.0452759Z #define __STDC_UTF_32__ 1 2025-05-07T19:45:02.0453053Z #define __STDC_VERSION__ 201710L 2025-05-07T19:45:02.0453354Z #define __STDC__ 1 2025-05-07T19:45:02.0453595Z #define __UINT16_C_SUFFIX__ 2025-05-07T19:45:02.0453892Z #define __UINT16_FMTX__ "hX" 2025-05-07T19:45:02.0454162Z #define __UINT16_FMTo__ "ho" 2025-05-07T19:45:02.0454450Z #define __UINT16_FMTu__ "hu" 2025-05-07T19:45:02.0454712Z #define __UINT16_FMTx__ "hx" 2025-05-07T19:45:02.0454988Z #define __UINT16_MAX__ 65535 2025-05-07T19:45:02.0455272Z #define __UINT16_TYPE__ unsigned short 2025-05-07T19:45:02.0455609Z #define __UINT32_C_SUFFIX__ U 2025-05-07T19:45:02.0455887Z #define __UINT32_FMTX__ "X" 2025-05-07T19:45:02.0456180Z #define __UINT32_FMTo__ "o" 2025-05-07T19:45:02.0456471Z #define __UINT32_FMTu__ "u" 2025-05-07T19:45:02.0456736Z #define __UINT32_FMTx__ "x" 2025-05-07T19:45:02.0457035Z #define __UINT32_MAX__ 4294967295U 2025-05-07T19:45:02.0457444Z #define __UINT32_TYPE__ unsigned int 2025-05-07T19:45:02.0457959Z #define __UINT64_C_SUFFIX__ UL 2025-05-07T19:45:02.0458332Z #define __UINT64_FMTX__ "lX" 2025-05-07T19:45:02.0458640Z #define __UINT64_FMTo__ "lo" 2025-05-07T19:45:02.0458922Z #define __UINT64_FMTu__ "lu" 2025-05-07T19:45:02.0459224Z #define __UINT64_FMTx__ "lx" 2025-05-07T19:45:02.0459521Z #define __UINT64_MAX__ 18446744073709551615UL 2025-05-07T19:45:02.0459895Z #define __UINT64_TYPE__ long unsigned int 2025-05-07T19:45:02.0460245Z #define __UINT8_C_SUFFIX__ 2025-05-07T19:45:02.0460527Z #define __UINT8_FMTX__ "hhX" 2025-05-07T19:45:02.0460842Z #define __UINT8_FMTo__ "hho" 2025-05-07T19:45:02.0461125Z #define __UINT8_FMTu__ "hhu" 2025-05-07T19:45:02.0461450Z #define __UINT8_FMTx__ "hhx" 2025-05-07T19:45:02.0461725Z #define __UINT8_MAX__ 255 2025-05-07T19:45:02.0462031Z #define __UINT8_TYPE__ unsigned char 2025-05-07T19:45:02.0462344Z #define __UINTMAX_C_SUFFIX__ UL 2025-05-07T19:45:02.0462793Z #define __UINTMAX_FMTX__ "lX" 2025-05-07T19:45:02.0463085Z #define __UINTMAX_FMTo__ "lo" 2025-05-07T19:45:02.0463398Z #define __UINTMAX_FMTu__ "lu" 2025-05-07T19:45:02.0463715Z #define __UINTMAX_FMTx__ "lx" 2025-05-07T19:45:02.0464030Z #define __UINTMAX_MAX__ 18446744073709551615UL 2025-05-07T19:45:02.0464505Z #define __UINTMAX_TYPE__ long unsigned int 2025-05-07T19:45:02.0464848Z #define __UINTMAX_WIDTH__ 64 2025-05-07T19:45:02.0465166Z #define __UINTPTR_FMTX__ "lX" 2025-05-07T19:45:02.0465463Z #define __UINTPTR_FMTo__ "lo" 2025-05-07T19:45:02.0465788Z #define __UINTPTR_FMTu__ "lu" 2025-05-07T19:45:02.0466082Z #define __UINTPTR_FMTx__ "lx" 2025-05-07T19:45:02.0466427Z #define __UINTPTR_MAX__ 18446744073709551615UL 2025-05-07T19:45:02.0466785Z #define __UINTPTR_TYPE__ long unsigned int 2025-05-07T19:45:02.0467153Z #define __UINTPTR_WIDTH__ 64 2025-05-07T19:45:02.0467709Z #define __UINT_FAST16_FMTX__ "hX" 2025-05-07T19:45:02.0468026Z #define __UINT_FAST16_FMTo__ "ho" 2025-05-07T19:45:02.0468374Z #define __UINT_FAST16_FMTu__ "hu" 2025-05-07T19:45:02.0468683Z #define __UINT_FAST16_FMTx__ "hx" 2025-05-07T19:45:02.0469022Z #define __UINT_FAST16_MAX__ 65535 2025-05-07T19:45:02.0469349Z #define __UINT_FAST16_TYPE__ unsigned short 2025-05-07T19:45:02.0469727Z #define __UINT_FAST32_FMTX__ "X" 2025-05-07T19:45:02.0470028Z #define __UINT_FAST32_FMTo__ "o" 2025-05-07T19:45:02.0470364Z #define __UINT_FAST32_FMTu__ "u" 2025-05-07T19:45:02.0470661Z #define __UINT_FAST32_FMTx__ "x" 2025-05-07T19:45:02.0470997Z #define __UINT_FAST32_MAX__ 4294967295U 2025-05-07T19:45:02.0471361Z #define __UINT_FAST32_TYPE__ unsigned int 2025-05-07T19:45:02.0471695Z #define __UINT_FAST64_FMTX__ "lX" 2025-05-07T19:45:02.0472028Z #define __UINT_FAST64_FMTo__ "lo" 2025-05-07T19:45:02.0472333Z #define __UINT_FAST64_FMTu__ "lu" 2025-05-07T19:45:02.0472667Z #define __UINT_FAST64_FMTx__ "lx" 2025-05-07T19:45:02.0473002Z #define __UINT_FAST64_MAX__ 18446744073709551615UL 2025-05-07T19:45:02.0473419Z #define __UINT_FAST64_TYPE__ long unsigned int 2025-05-07T19:45:02.0473762Z #define __UINT_FAST8_FMTX__ "hhX" 2025-05-07T19:45:02.0474085Z #define __UINT_FAST8_FMTo__ "hho" 2025-05-07T19:45:02.0474405Z #define __UINT_FAST8_FMTu__ "hhu" 2025-05-07T19:45:02.0474699Z #define __UINT_FAST8_FMTx__ "hhx" 2025-05-07T19:45:02.0475031Z #define __UINT_FAST8_MAX__ 255 2025-05-07T19:45:02.0475322Z #define __UINT_FAST8_TYPE__ unsigned char 2025-05-07T19:45:02.0475679Z #define __UINT_LEAST16_FMTX__ "hX" 2025-05-07T19:45:02.0475990Z #define __UINT_LEAST16_FMTo__ "ho" 2025-05-07T19:45:02.0476317Z #define __UINT_LEAST16_FMTu__ "hu" 2025-05-07T19:45:02.0476626Z #define __UINT_LEAST16_FMTx__ "hx" 2025-05-07T19:45:02.0476958Z #define __UINT_LEAST16_MAX__ 65535 2025-05-07T19:45:02.0477283Z #define __UINT_LEAST16_TYPE__ unsigned short 2025-05-07T19:45:02.0477649Z #define __UINT_LEAST32_FMTX__ "X" 2025-05-07T19:45:02.0477981Z #define __UINT_LEAST32_FMTo__ "o" 2025-05-07T19:45:02.0478289Z #define __UINT_LEAST32_FMTu__ "u" 2025-05-07T19:45:02.0478633Z #define __UINT_LEAST32_FMTx__ "x" 2025-05-07T19:45:02.0478936Z #define __UINT_LEAST32_MAX__ 4294967295U 2025-05-07T19:45:02.0479417Z #define __UINT_LEAST32_TYPE__ unsigned int 2025-05-07T19:45:02.0479729Z #define __UINT_LEAST64_FMTX__ "lX" 2025-05-07T19:45:02.0480044Z #define __UINT_LEAST64_FMTo__ "lo" 2025-05-07T19:45:02.0480335Z #define __UINT_LEAST64_FMTu__ "lu" 2025-05-07T19:45:02.0480646Z #define __UINT_LEAST64_FMTx__ "lx" 2025-05-07T19:45:02.0480959Z #define __UINT_LEAST64_MAX__ 18446744073709551615UL 2025-05-07T19:45:02.0481338Z #define __UINT_LEAST64_TYPE__ long unsigned int 2025-05-07T19:45:02.0481695Z #define __UINT_LEAST8_FMTX__ "hhX" 2025-05-07T19:45:02.0482163Z #define __UINT_LEAST8_FMTo__ "hho" 2025-05-07T19:45:02.0482489Z #define __UINT_LEAST8_FMTu__ "hhu" 2025-05-07T19:45:02.0482782Z #define __UINT_LEAST8_FMTx__ "hhx" 2025-05-07T19:45:02.0483102Z #define __UINT_LEAST8_MAX__ 255 2025-05-07T19:45:02.0483403Z #define __UINT_LEAST8_TYPE__ unsigned char 2025-05-07T19:45:02.0483939Z #define __USER_LABEL_PREFIX__ 2025-05-07T19:45:02.0484581Z #define __VERSION__ "Clang 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4)" 2025-05-07T19:45:02.0485258Z #define __WCHAR_MAX__ 2147483647 2025-05-07T19:45:02.0485566Z #define __WCHAR_TYPE__ int 2025-05-07T19:45:02.0485941Z #define __WCHAR_WIDTH__ 32 2025-05-07T19:45:02.0486251Z #define __WINT_MAX__ 4294967295U 2025-05-07T19:45:02.0486543Z #define __WINT_TYPE__ unsigned int 2025-05-07T19:45:02.0486872Z #define __WINT_UNSIGNED__ 1 2025-05-07T19:45:02.0487151Z #define __WINT_WIDTH__ 32 2025-05-07T19:45:02.0487431Z #define __amd64 1 2025-05-07T19:45:02.0487671Z #define __amd64__ 1 2025-05-07T19:45:02.0487931Z #define __clang__ 1 2025-05-07T19:45:02.0488195Z #define __clang_literal_encoding__ "UTF-8" 2025-05-07T19:45:02.0488694Z #define __clang_major__ 16 2025-05-07T19:45:02.0488980Z #define __clang_minor__ 0 2025-05-07T19:45:02.0489249Z #define __clang_patchlevel__ 6 2025-05-07T19:45:02.0489885Z #define __clang_version__ "16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4)" 2025-05-07T19:45:02.0490565Z #define __clang_wide_literal_encoding__ "UTF-32" 2025-05-07T19:45:02.0490934Z #define __code_model_small__ 1 2025-05-07T19:45:02.0491211Z #define __gnu_linux__ 1 2025-05-07T19:45:02.0491480Z #define __k8 1 2025-05-07T19:45:02.0491700Z #define __k8__ 1 2025-05-07T19:45:02.0491947Z #define __linux 1 2025-05-07T19:45:02.0492196Z #define __linux__ 1 2025-05-07T19:45:02.0492425Z #define __llvm__ 1 2025-05-07T19:45:02.0492682Z #define __pic__ 2 2025-05-07T19:45:02.0492906Z #define __pie__ 2 2025-05-07T19:45:02.0493211Z #define __seg_fs __attribute__((address_space(257))) 2025-05-07T19:45:02.0493604Z #define __seg_gs __attribute__((address_space(256))) 2025-05-07T19:45:02.0494077Z #define __tune_k8__ 1 2025-05-07T19:45:02.0494308Z #define __unix 1 2025-05-07T19:45:02.0494548Z #define __unix__ 1 2025-05-07T19:45:02.0494770Z #define __x86_64 1 2025-05-07T19:45:02.0495016Z #define __x86_64__ 1 2025-05-07T19:45:02.0495250Z #define linux 1 2025-05-07T19:45:02.0495497Z #define unix 1 2025-05-07T19:45:02.0495627Z 2025-05-07T19:45:02.1128455Z 2025-05-07T19:45:02.1128969Z [INFO] Printing out all preprocessor defines in the C++ compiler ... 2025-05-07T19:45:02.1129486Z + conda run -n build_binary c++ -dM -E -x c++ - 2025-05-07T19:45:02.1129784Z 2025-05-07T19:45:03.9642493Z #define _GNU_SOURCE 1 2025-05-07T19:45:03.9642826Z #define _LP64 1 2025-05-07T19:45:03.9643118Z #define __ATOMIC_ACQUIRE 2 2025-05-07T19:45:03.9643423Z #define __ATOMIC_ACQ_REL 4 2025-05-07T19:45:03.9643749Z #define __ATOMIC_CONSUME 1 2025-05-07T19:45:03.9644041Z #define __ATOMIC_RELAXED 0 2025-05-07T19:45:03.9644409Z #define __ATOMIC_RELEASE 3 2025-05-07T19:45:03.9644692Z #define __ATOMIC_SEQ_CST 5 2025-05-07T19:45:03.9645021Z #define __BIGGEST_ALIGNMENT__ 16 2025-05-07T19:45:03.9645343Z #define __BITINT_MAXWIDTH__ 8388608 2025-05-07T19:45:03.9645700Z #define __BOOL_WIDTH__ 8 2025-05-07T19:45:03.9646075Z #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__ 2025-05-07T19:45:03.9646449Z #define __CHAR16_TYPE__ unsigned short 2025-05-07T19:45:03.9646930Z #define __CHAR32_TYPE__ unsigned int 2025-05-07T19:45:03.9647244Z #define __CHAR_BIT__ 8 2025-05-07T19:45:03.9647574Z #define __CLANG_ATOMIC_BOOL_LOCK_FREE 2 2025-05-07T19:45:03.9647953Z #define __CLANG_ATOMIC_CHAR16_T_LOCK_FREE 2 2025-05-07T19:45:03.9648342Z #define __CLANG_ATOMIC_CHAR32_T_LOCK_FREE 2 2025-05-07T19:45:03.9648684Z #define __CLANG_ATOMIC_CHAR_LOCK_FREE 2 2025-05-07T19:45:03.9649046Z #define __CLANG_ATOMIC_INT_LOCK_FREE 2 2025-05-07T19:45:03.9649397Z #define __CLANG_ATOMIC_LLONG_LOCK_FREE 2 2025-05-07T19:45:03.9649729Z #define __CLANG_ATOMIC_LONG_LOCK_FREE 2 2025-05-07T19:45:03.9650094Z #define __CLANG_ATOMIC_POINTER_LOCK_FREE 2 2025-05-07T19:45:03.9650438Z #define __CLANG_ATOMIC_SHORT_LOCK_FREE 2 2025-05-07T19:45:03.9650809Z #define __CLANG_ATOMIC_WCHAR_T_LOCK_FREE 2 2025-05-07T19:45:03.9651144Z #define __CONSTANT_CFSTRINGS__ 1 2025-05-07T19:45:03.9651752Z #define __DBL_DECIMAL_DIG__ 17 2025-05-07T19:45:03.9652082Z #define __DBL_DENORM_MIN__ 4.9406564584124654e-324 2025-05-07T19:45:03.9652476Z #define __DBL_DIG__ 15 2025-05-07T19:45:03.9652797Z #define __DBL_EPSILON__ 2.2204460492503131e-16 2025-05-07T19:45:03.9653149Z #define __DBL_HAS_DENORM__ 1 2025-05-07T19:45:03.9653612Z #define __DBL_HAS_INFINITY__ 1 2025-05-07T19:45:03.9653917Z #define __DBL_HAS_QUIET_NAN__ 1 2025-05-07T19:45:03.9654240Z #define __DBL_MANT_DIG__ 53 2025-05-07T19:45:03.9654531Z #define __DBL_MAX_10_EXP__ 308 2025-05-07T19:45:03.9654858Z #define __DBL_MAX_EXP__ 1024 2025-05-07T19:45:03.9655157Z #define __DBL_MAX__ 1.7976931348623157e+308 2025-05-07T19:45:03.9655518Z #define __DBL_MIN_10_EXP__ (-307) 2025-05-07T19:45:03.9655825Z #define __DBL_MIN_EXP__ (-1021) 2025-05-07T19:45:03.9656160Z #define __DBL_MIN__ 2.2250738585072014e-308 2025-05-07T19:45:03.9656538Z #define __DECIMAL_DIG__ __LDBL_DECIMAL_DIG__ 2025-05-07T19:45:03.9656871Z #define __DEPRECATED 1 2025-05-07T19:45:03.9657277Z #define __ELF__ 1 2025-05-07T19:45:03.9657536Z #define __EXCEPTIONS 1 2025-05-07T19:45:03.9657848Z #define __FINITE_MATH_ONLY__ 0 2025-05-07T19:45:03.9658202Z #define __FLOAT128__ 1 2025-05-07T19:45:03.9658510Z #define __FLT16_DECIMAL_DIG__ 5 2025-05-07T19:45:03.9658847Z #define __FLT16_DENORM_MIN__ 5.9604644775390625e-8F16 2025-05-07T19:45:03.9659235Z #define __FLT16_DIG__ 3 2025-05-07T19:45:03.9659513Z #define __FLT16_EPSILON__ 9.765625e-4F16 2025-05-07T19:45:03.9659861Z #define __FLT16_HAS_DENORM__ 1 2025-05-07T19:45:03.9660187Z #define __FLT16_HAS_INFINITY__ 1 2025-05-07T19:45:03.9660487Z #define __FLT16_HAS_QUIET_NAN__ 1 2025-05-07T19:45:03.9660800Z #define __FLT16_MANT_DIG__ 11 2025-05-07T19:45:03.9661077Z #define __FLT16_MAX_10_EXP__ 4 2025-05-07T19:45:03.9661373Z #define __FLT16_MAX_EXP__ 16 2025-05-07T19:45:03.9661641Z #define __FLT16_MAX__ 6.5504e+4F16 2025-05-07T19:45:03.9661946Z #define __FLT16_MIN_10_EXP__ (-4) 2025-05-07T19:45:03.9662231Z #define __FLT16_MIN_EXP__ (-13) 2025-05-07T19:45:03.9662546Z #define __FLT16_MIN__ 6.103515625e-5F16 2025-05-07T19:45:03.9662861Z #define __FLT_DECIMAL_DIG__ 9 2025-05-07T19:45:03.9663196Z #define __FLT_DENORM_MIN__ 1.40129846e-45F 2025-05-07T19:45:03.9663651Z #define __FLT_DIG__ 6 2025-05-07T19:45:03.9663911Z #define __FLT_EPSILON__ 1.19209290e-7F 2025-05-07T19:45:03.9664246Z #define __FLT_HAS_DENORM__ 1 2025-05-07T19:45:03.9664529Z #define __FLT_HAS_INFINITY__ 1 2025-05-07T19:45:03.9664837Z #define __FLT_HAS_QUIET_NAN__ 1 2025-05-07T19:45:03.9665123Z #define __FLT_MANT_DIG__ 24 2025-05-07T19:45:03.9665421Z #define __FLT_MAX_10_EXP__ 38 2025-05-07T19:45:03.9665703Z #define __FLT_MAX_EXP__ 128 2025-05-07T19:45:03.9666012Z #define __FLT_MAX__ 3.40282347e+38F 2025-05-07T19:45:03.9666312Z #define __FLT_MIN_10_EXP__ (-37) 2025-05-07T19:45:03.9666637Z #define __FLT_MIN_EXP__ (-125) 2025-05-07T19:45:03.9666949Z #define __FLT_MIN__ 1.17549435e-38F 2025-05-07T19:45:03.9667244Z #define __FLT_RADIX__ 2 2025-05-07T19:45:03.9668106Z #define __FXSR__ 1 2025-05-07T19:45:03.9668409Z #define __GCC_ASM_FLAG_OUTPUTS__ 1 2025-05-07T19:45:03.9668764Z #define __GCC_ATOMIC_BOOL_LOCK_FREE 2 2025-05-07T19:45:03.9669141Z #define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 2 2025-05-07T19:45:03.9669522Z #define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2 2025-05-07T19:45:03.9669868Z #define __GCC_ATOMIC_CHAR_LOCK_FREE 2 2025-05-07T19:45:03.9670234Z #define __GCC_ATOMIC_INT_LOCK_FREE 2 2025-05-07T19:45:03.9670566Z #define __GCC_ATOMIC_LLONG_LOCK_FREE 2 2025-05-07T19:45:03.9670931Z #define __GCC_ATOMIC_LONG_LOCK_FREE 2 2025-05-07T19:45:03.9671285Z #define __GCC_ATOMIC_POINTER_LOCK_FREE 2 2025-05-07T19:45:03.9671621Z #define __GCC_ATOMIC_SHORT_LOCK_FREE 2 2025-05-07T19:45:03.9671991Z #define __GCC_ATOMIC_TEST_AND_SET_TRUEVAL 1 2025-05-07T19:45:03.9672343Z #define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2 2025-05-07T19:45:03.9672707Z #define __GCC_HAVE_DWARF2_CFI_ASM 1 2025-05-07T19:45:03.9673041Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1 2025-05-07T19:45:03.9673432Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1 2025-05-07T19:45:03.9673949Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1 2025-05-07T19:45:03.9674350Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1 2025-05-07T19:45:03.9674737Z #define __GLIBCXX_BITSIZE_INT_N_0 128 2025-05-07T19:45:03.9675179Z #define __GLIBCXX_TYPE_INT_N_0 __int128 2025-05-07T19:45:03.9675540Z #define __GNUC_GNU_INLINE__ 1 2025-05-07T19:45:03.9675918Z #define __GNUC_MINOR__ 2 2025-05-07T19:45:03.9676225Z #define __GNUC_PATCHLEVEL__ 1 2025-05-07T19:45:03.9676512Z #define __GNUC__ 4 2025-05-07T19:45:03.9676792Z #define __GNUG__ 4 2025-05-07T19:45:03.9677050Z #define __GXX_ABI_VERSION 1002 2025-05-07T19:45:03.9677391Z #define __GXX_EXPERIMENTAL_CXX0X__ 1 2025-05-07T19:45:03.9677703Z #define __GXX_RTTI 1 2025-05-07T19:45:03.9677987Z #define __GXX_WEAK__ 1 2025-05-07T19:45:03.9678289Z #define __INT16_C_SUFFIX__ 2025-05-07T19:45:03.9678573Z #define __INT16_FMTd__ "hd" 2025-05-07T19:45:03.9678881Z #define __INT16_FMTi__ "hi" 2025-05-07T19:45:03.9679157Z #define __INT16_MAX__ 32767 2025-05-07T19:45:03.9679468Z #define __INT16_TYPE__ short 2025-05-07T19:45:03.9679765Z #define __INT32_C_SUFFIX__ 2025-05-07T19:45:03.9680064Z #define __INT32_FMTd__ "d" 2025-05-07T19:45:03.9680337Z #define __INT32_FMTi__ "i" 2025-05-07T19:45:03.9680653Z #define __INT32_MAX__ 2147483647 2025-05-07T19:45:03.9680951Z #define __INT32_TYPE__ int 2025-05-07T19:45:03.9681252Z #define __INT64_C_SUFFIX__ L 2025-05-07T19:45:03.9681557Z #define __INT64_FMTd__ "ld" 2025-05-07T19:45:03.9681831Z #define __INT64_FMTi__ "li" 2025-05-07T19:45:03.9682136Z #define __INT64_MAX__ 9223372036854775807L 2025-05-07T19:45:03.9682454Z #define __INT64_TYPE__ long int 2025-05-07T19:45:03.9682764Z #define __INT8_C_SUFFIX__ 2025-05-07T19:45:03.9683028Z #define __INT8_FMTd__ "hhd" 2025-05-07T19:45:03.9683324Z #define __INT8_FMTi__ "hhi" 2025-05-07T19:45:03.9683591Z #define __INT8_MAX__ 127 2025-05-07T19:45:03.9683888Z #define __INT8_TYPE__ signed char 2025-05-07T19:45:03.9684184Z #define __INTMAX_C_SUFFIX__ L 2025-05-07T19:45:03.9684499Z #define __INTMAX_FMTd__ "ld" 2025-05-07T19:45:03.9684783Z #define __INTMAX_FMTi__ "li" 2025-05-07T19:45:03.9685103Z #define __INTMAX_MAX__ 9223372036854775807L 2025-05-07T19:45:03.9703379Z #define __INTMAX_TYPE__ long int 2025-05-07T19:45:03.9703755Z #define __INTMAX_WIDTH__ 64 2025-05-07T19:45:03.9704066Z #define __INTPTR_FMTd__ "ld" 2025-05-07T19:45:03.9704418Z #define __INTPTR_FMTi__ "li" 2025-05-07T19:45:03.9704750Z #define __INTPTR_MAX__ 9223372036854775807L 2025-05-07T19:45:03.9705124Z #define __INTPTR_TYPE__ long int 2025-05-07T19:45:03.9705430Z #define __INTPTR_WIDTH__ 64 2025-05-07T19:45:03.9705757Z #define __INT_FAST16_FMTd__ "hd" 2025-05-07T19:45:03.9706093Z #define __INT_FAST16_FMTi__ "hi" 2025-05-07T19:45:03.9706395Z #define __INT_FAST16_MAX__ 32767 2025-05-07T19:45:03.9706731Z #define __INT_FAST16_TYPE__ short 2025-05-07T19:45:03.9707037Z #define __INT_FAST16_WIDTH__ 16 2025-05-07T19:45:03.9707368Z #define __INT_FAST32_FMTd__ "d" 2025-05-07T19:45:03.9707664Z #define __INT_FAST32_FMTi__ "i" 2025-05-07T19:45:03.9707996Z #define __INT_FAST32_MAX__ 2147483647 2025-05-07T19:45:03.9708333Z #define __INT_FAST32_TYPE__ int 2025-05-07T19:45:03.9708640Z #define __INT_FAST32_WIDTH__ 32 2025-05-07T19:45:03.9708901Z #define __INT_FAST64_FMTd__ "ld" 2025-05-07T19:45:03.9709172Z #define __INT_FAST64_FMTi__ "li" 2025-05-07T19:45:03.9709594Z #define __INT_FAST64_MAX__ 9223372036854775807L 2025-05-07T19:45:03.9709968Z #define __INT_FAST64_TYPE__ long int 2025-05-07T19:45:03.9710298Z #define __INT_FAST64_WIDTH__ 64 2025-05-07T19:45:03.9710576Z #define __INT_FAST8_FMTd__ "hhd" 2025-05-07T19:45:03.9710883Z #define __INT_FAST8_FMTi__ "hhi" 2025-05-07T19:45:03.9711168Z #define __INT_FAST8_MAX__ 127 2025-05-07T19:45:03.9711481Z #define __INT_FAST8_TYPE__ signed char 2025-05-07T19:45:03.9711783Z #define __INT_FAST8_WIDTH__ 8 2025-05-07T19:45:03.9712088Z #define __INT_LEAST16_FMTd__ "hd" 2025-05-07T19:45:03.9712376Z #define __INT_LEAST16_FMTi__ "hi" 2025-05-07T19:45:03.9712692Z #define __INT_LEAST16_MAX__ 32767 2025-05-07T19:45:03.9712982Z #define __INT_LEAST16_TYPE__ short 2025-05-07T19:45:03.9713448Z #define __INT_LEAST16_WIDTH__ 16 2025-05-07T19:45:03.9713768Z #define __INT_LEAST32_FMTd__ "d" 2025-05-07T19:45:03.9714057Z #define __INT_LEAST32_FMTi__ "i" 2025-05-07T19:45:03.9714378Z #define __INT_LEAST32_MAX__ 2147483647 2025-05-07T19:45:03.9714683Z #define __INT_LEAST32_TYPE__ int 2025-05-07T19:45:03.9715064Z #define __INT_LEAST32_WIDTH__ 32 2025-05-07T19:45:03.9715345Z #define __INT_LEAST64_FMTd__ "ld" 2025-05-07T19:45:03.9715653Z #define __INT_LEAST64_FMTi__ "li" 2025-05-07T19:45:03.9715958Z #define __INT_LEAST64_MAX__ 9223372036854775807L 2025-05-07T19:45:03.9716323Z #define __INT_LEAST64_TYPE__ long int 2025-05-07T19:45:03.9716623Z #define __INT_LEAST64_WIDTH__ 64 2025-05-07T19:45:03.9716938Z #define __INT_LEAST8_FMTd__ "hhd" 2025-05-07T19:45:03.9717251Z #define __INT_LEAST8_FMTi__ "hhi" 2025-05-07T19:45:03.9717534Z #define __INT_LEAST8_MAX__ 127 2025-05-07T19:45:03.9717840Z #define __INT_LEAST8_TYPE__ signed char 2025-05-07T19:45:03.9718147Z #define __INT_LEAST8_WIDTH__ 8 2025-05-07T19:45:03.9718448Z #define __INT_MAX__ 2147483647 2025-05-07T19:45:03.9718712Z #define __INT_WIDTH__ 32 2025-05-07T19:45:03.9718997Z #define __LDBL_DECIMAL_DIG__ 21 2025-05-07T19:45:03.9719324Z #define __LDBL_DENORM_MIN__ 3.64519953188247460253e-4951L 2025-05-07T19:45:03.9719697Z #define __LDBL_DIG__ 18 2025-05-07T19:45:03.9719980Z #define __LDBL_EPSILON__ 1.08420217248550443401e-19L 2025-05-07T19:45:03.9720345Z #define __LDBL_HAS_DENORM__ 1 2025-05-07T19:45:03.9720652Z #define __LDBL_HAS_INFINITY__ 1 2025-05-07T19:45:03.9720930Z #define __LDBL_HAS_QUIET_NAN__ 1 2025-05-07T19:45:03.9721232Z #define __LDBL_MANT_DIG__ 64 2025-05-07T19:45:03.9721499Z #define __LDBL_MAX_10_EXP__ 4932 2025-05-07T19:45:03.9721800Z #define __LDBL_MAX_EXP__ 16384 2025-05-07T19:45:03.9722097Z #define __LDBL_MAX__ 1.18973149535723176502e+4932L 2025-05-07T19:45:03.9722453Z #define __LDBL_MIN_10_EXP__ (-4931) 2025-05-07T19:45:03.9722743Z #define __LDBL_MIN_EXP__ (-16381) 2025-05-07T19:45:03.9723081Z #define __LDBL_MIN__ 3.36210314311209350626e-4932L 2025-05-07T19:45:03.9723436Z #define __LITTLE_ENDIAN__ 1 2025-05-07T19:45:03.9723705Z #define __LLONG_WIDTH__ 64 2025-05-07T19:45:03.9724019Z #define __LONG_LONG_MAX__ 9223372036854775807LL 2025-05-07T19:45:03.9724351Z #define __LONG_MAX__ 9223372036854775807L 2025-05-07T19:45:03.9724680Z #define __LONG_WIDTH__ 64 2025-05-07T19:45:03.9724938Z #define __LP64__ 1 2025-05-07T19:45:03.9725201Z #define __MMX__ 1 2025-05-07T19:45:03.9725434Z #define __NO_INLINE__ 1 2025-05-07T19:45:03.9725721Z #define __NO_MATH_INLINES 1 2025-05-07T19:45:03.9725992Z #define __OBJC_BOOL_IS_BOOL 0 2025-05-07T19:45:03.9726327Z #define __OPENCL_MEMORY_SCOPE_ALL_SVM_DEVICES 3 2025-05-07T19:45:03.9726696Z #define __OPENCL_MEMORY_SCOPE_DEVICE 2 2025-05-07T19:45:03.9727016Z #define __OPENCL_MEMORY_SCOPE_SUB_GROUP 4 2025-05-07T19:45:03.9727375Z #define __OPENCL_MEMORY_SCOPE_WORK_GROUP 1 2025-05-07T19:45:03.9727706Z #define __OPENCL_MEMORY_SCOPE_WORK_ITEM 0 2025-05-07T19:45:03.9728056Z #define __ORDER_BIG_ENDIAN__ 4321 2025-05-07T19:45:03.9728357Z #define __ORDER_LITTLE_ENDIAN__ 1234 2025-05-07T19:45:03.9728682Z #define __ORDER_PDP_ENDIAN__ 3412 2025-05-07T19:45:03.9728968Z #define __PIC__ 2 2025-05-07T19:45:03.9729232Z #define __PIE__ 2 2025-05-07T19:45:03.9729472Z #define __POINTER_WIDTH__ 64 2025-05-07T19:45:03.9729797Z #define __PRAGMA_REDEFINE_EXTNAME 1 2025-05-07T19:45:03.9730122Z #define __PTRDIFF_FMTd__ "ld" 2025-05-07T19:45:03.9730400Z #define __PTRDIFF_FMTi__ "li" 2025-05-07T19:45:03.9730719Z #define __PTRDIFF_MAX__ 9223372036854775807L 2025-05-07T19:45:03.9731038Z #define __PTRDIFF_TYPE__ long int 2025-05-07T19:45:03.9731361Z #define __PTRDIFF_WIDTH__ 64 2025-05-07T19:45:03.9731637Z #define __REGISTER_PREFIX__ 2025-05-07T19:45:03.9731941Z #define __SCHAR_MAX__ 127 2025-05-07T19:45:03.9732191Z #define __SEG_FS 1 2025-05-07T19:45:03.9732459Z #define __SEG_GS 1 2025-05-07T19:45:03.9732695Z #define __SHRT_MAX__ 32767 2025-05-07T19:45:03.9732991Z #define __SHRT_WIDTH__ 16 2025-05-07T19:45:03.9733413Z #define __SIG_ATOMIC_MAX__ 2147483647 2025-05-07T19:45:03.9733721Z #define __SIG_ATOMIC_WIDTH__ 32 2025-05-07T19:45:03.9734034Z #define __SIZEOF_DOUBLE__ 8 2025-05-07T19:45:03.9734311Z #define __SIZEOF_FLOAT128__ 16 2025-05-07T19:45:03.9734621Z #define __SIZEOF_FLOAT__ 4 2025-05-07T19:45:03.9734887Z #define __SIZEOF_INT128__ 16 2025-05-07T19:45:03.9735248Z #define __SIZEOF_INT__ 4 2025-05-07T19:45:03.9735508Z #define __SIZEOF_LONG_DOUBLE__ 16 2025-05-07T19:45:03.9735825Z #define __SIZEOF_LONG_LONG__ 8 2025-05-07T19:45:03.9736098Z #define __SIZEOF_LONG__ 8 2025-05-07T19:45:03.9736388Z #define __SIZEOF_POINTER__ 8 2025-05-07T19:45:03.9736657Z #define __SIZEOF_PTRDIFF_T__ 8 2025-05-07T19:45:03.9736952Z #define __SIZEOF_SHORT__ 2 2025-05-07T19:45:03.9737329Z #define __SIZEOF_SIZE_T__ 8 2025-05-07T19:45:03.9737760Z #define __SIZEOF_WCHAR_T__ 4 2025-05-07T19:45:03.9738073Z #define __SIZEOF_WINT_T__ 4 2025-05-07T19:45:03.9738365Z #define __SIZE_FMTX__ "lX" 2025-05-07T19:45:03.9738670Z #define __SIZE_FMTo__ "lo" 2025-05-07T19:45:03.9738952Z #define __SIZE_FMTu__ "lu" 2025-05-07T19:45:03.9739251Z #define __SIZE_FMTx__ "lx" 2025-05-07T19:45:03.9739537Z #define __SIZE_MAX__ 18446744073709551615UL 2025-05-07T19:45:03.9739912Z #define __SIZE_TYPE__ long unsigned int 2025-05-07T19:45:03.9740235Z #define __SIZE_WIDTH__ 64 2025-05-07T19:45:03.9740553Z #define __SSE2_MATH__ 1 2025-05-07T19:45:03.9740850Z #define __SSE2__ 1 2025-05-07T19:45:03.9741100Z #define __SSE_MATH__ 1 2025-05-07T19:45:03.9741388Z #define __SSE__ 1 2025-05-07T19:45:03.9741667Z #define __STDCPP_DEFAULT_NEW_ALIGNMENT__ 16UL 2025-05-07T19:45:03.9742047Z #define __STDCPP_THREADS__ 1 2025-05-07T19:45:03.9742338Z #define __STDC_HOSTED__ 1 2025-05-07T19:45:03.9742645Z #define __STDC_UTF_16__ 1 2025-05-07T19:45:03.9742919Z #define __STDC_UTF_32__ 1 2025-05-07T19:45:03.9743208Z #define __STDC__ 1 2025-05-07T19:45:03.9743462Z #define __UINT16_C_SUFFIX__ 2025-05-07T19:45:03.9743777Z #define __UINT16_FMTX__ "hX" 2025-05-07T19:45:03.9744092Z #define __UINT16_FMTo__ "ho" 2025-05-07T19:45:03.9744381Z #define __UINT16_FMTu__ "hu" 2025-05-07T19:45:03.9744694Z #define __UINT16_FMTx__ "hx" 2025-05-07T19:45:03.9744981Z #define __UINT16_MAX__ 65535 2025-05-07T19:45:03.9745303Z #define __UINT16_TYPE__ unsigned short 2025-05-07T19:45:03.9745623Z #define __UINT32_C_SUFFIX__ U 2025-05-07T19:45:03.9745945Z #define __UINT32_FMTX__ "X" 2025-05-07T19:45:03.9746222Z #define __UINT32_FMTo__ "o" 2025-05-07T19:45:03.9746523Z #define __UINT32_FMTu__ "u" 2025-05-07T19:45:03.9746795Z #define __UINT32_FMTx__ "x" 2025-05-07T19:45:03.9747104Z #define __UINT32_MAX__ 4294967295U 2025-05-07T19:45:03.9747437Z #define __UINT32_TYPE__ unsigned int 2025-05-07T19:45:03.9747750Z #define __UINT64_C_SUFFIX__ UL 2025-05-07T19:45:03.9748071Z #define __UINT64_FMTX__ "lX" 2025-05-07T19:45:03.9748357Z #define __UINT64_FMTo__ "lo" 2025-05-07T19:45:03.9748671Z #define __UINT64_FMTu__ "lu" 2025-05-07T19:45:03.9748954Z #define __UINT64_FMTx__ "lx" 2025-05-07T19:45:03.9749281Z #define __UINT64_MAX__ 18446744073709551615UL 2025-05-07T19:45:03.9749634Z #define __UINT64_TYPE__ long unsigned int 2025-05-07T19:45:03.9750085Z #define __UINT8_C_SUFFIX__ 2025-05-07T19:45:03.9750350Z #define __UINT8_FMTX__ "hhX" 2025-05-07T19:45:03.9750642Z #define __UINT8_FMTo__ "hho" 2025-05-07T19:45:03.9750933Z #define __UINT8_FMTu__ "hhu" 2025-05-07T19:45:03.9751203Z #define __UINT8_FMTx__ "hhx" 2025-05-07T19:45:03.9751495Z #define __UINT8_MAX__ 255 2025-05-07T19:45:03.9751764Z #define __UINT8_TYPE__ unsigned char 2025-05-07T19:45:03.9752088Z #define __UINTMAX_C_SUFFIX__ UL 2025-05-07T19:45:03.9752372Z #define __UINTMAX_FMTX__ "lX" 2025-05-07T19:45:03.9752670Z #define __UINTMAX_FMTo__ "lo" 2025-05-07T19:45:03.9752938Z #define __UINTMAX_FMTu__ "lu" 2025-05-07T19:45:03.9753233Z #define __UINTMAX_FMTx__ "lx" 2025-05-07T19:45:03.9753525Z #define __UINTMAX_MAX__ 18446744073709551615UL 2025-05-07T19:45:03.9753889Z #define __UINTMAX_TYPE__ long unsigned int 2025-05-07T19:45:03.9754227Z #define __UINTMAX_WIDTH__ 64 2025-05-07T19:45:03.9754596Z #define __UINTPTR_FMTX__ "lX" 2025-05-07T19:45:03.9754903Z #define __UINTPTR_FMTo__ "lo" 2025-05-07T19:45:03.9755177Z #define __UINTPTR_FMTu__ "lu" 2025-05-07T19:45:03.9755479Z #define __UINTPTR_FMTx__ "lx" 2025-05-07T19:45:03.9755780Z #define __UINTPTR_MAX__ 18446744073709551615UL 2025-05-07T19:45:03.9756163Z #define __UINTPTR_TYPE__ long unsigned int 2025-05-07T19:45:03.9759750Z #define __UINTPTR_WIDTH__ 64 2025-05-07T19:45:03.9760132Z #define __UINT_FAST16_FMTX__ "hX" 2025-05-07T19:45:03.9760427Z #define __UINT_FAST16_FMTo__ "ho" 2025-05-07T19:45:03.9760754Z #define __UINT_FAST16_FMTu__ "hu" 2025-05-07T19:45:03.9761074Z #define __UINT_FAST16_FMTx__ "hx" 2025-05-07T19:45:03.9761383Z #define __UINT_FAST16_MAX__ 65535 2025-05-07T19:45:03.9761685Z #define __UINT_FAST16_TYPE__ unsigned short 2025-05-07T19:45:03.9762033Z #define __UINT_FAST32_FMTX__ "X" 2025-05-07T19:45:03.9762317Z #define __UINT_FAST32_FMTo__ "o" 2025-05-07T19:45:03.9762626Z #define __UINT_FAST32_FMTu__ "u" 2025-05-07T19:45:03.9762906Z #define __UINT_FAST32_FMTx__ "x" 2025-05-07T19:45:03.9763227Z #define __UINT_FAST32_MAX__ 4294967295U 2025-05-07T19:45:03.9763631Z #define __UINT_FAST32_TYPE__ unsigned int 2025-05-07T19:45:03.9763950Z #define __UINT_FAST64_FMTX__ "lX" 2025-05-07T19:45:03.9764272Z #define __UINT_FAST64_FMTo__ "lo" 2025-05-07T19:45:03.9764561Z #define __UINT_FAST64_FMTu__ "lu" 2025-05-07T19:45:03.9764894Z #define __UINT_FAST64_FMTx__ "lx" 2025-05-07T19:45:03.9765208Z #define __UINT_FAST64_MAX__ 18446744073709551615UL 2025-05-07T19:45:03.9765604Z #define __UINT_FAST64_TYPE__ long unsigned int 2025-05-07T19:45:03.9765969Z #define __UINT_FAST8_FMTX__ "hhX" 2025-05-07T19:45:03.9766260Z #define __UINT_FAST8_FMTo__ "hho" 2025-05-07T19:45:03.9766578Z #define __UINT_FAST8_FMTu__ "hhu" 2025-05-07T19:45:03.9766867Z #define __UINT_FAST8_FMTx__ "hhx" 2025-05-07T19:45:03.9767179Z #define __UINT_FAST8_MAX__ 255 2025-05-07T19:45:03.9767634Z #define __UINT_FAST8_TYPE__ unsigned char 2025-05-07T19:45:03.9768179Z #define __UINT_LEAST16_FMTX__ "hX" 2025-05-07T19:45:03.9768502Z #define __UINT_LEAST16_FMTo__ "ho" 2025-05-07T19:45:03.9768860Z #define __UINT_LEAST16_FMTu__ "hu" 2025-05-07T19:45:03.9769170Z #define __UINT_LEAST16_FMTx__ "hx" 2025-05-07T19:45:03.9769500Z #define __UINT_LEAST16_MAX__ 65535 2025-05-07T19:45:03.9769850Z #define __UINT_LEAST16_TYPE__ unsigned short 2025-05-07T19:45:03.9770200Z #define __UINT_LEAST32_FMTX__ "X" 2025-05-07T19:45:03.9770528Z #define __UINT_LEAST32_FMTo__ "o" 2025-05-07T19:45:03.9770828Z #define __UINT_LEAST32_FMTu__ "u" 2025-05-07T19:45:03.9771159Z #define __UINT_LEAST32_FMTx__ "x" 2025-05-07T19:45:03.9771470Z #define __UINT_LEAST32_MAX__ 4294967295U 2025-05-07T19:45:03.9771830Z #define __UINT_LEAST32_TYPE__ unsigned int 2025-05-07T19:45:03.9772165Z #define __UINT_LEAST64_FMTX__ "lX" 2025-05-07T19:45:03.9772501Z #define __UINT_LEAST64_FMTo__ "lo" 2025-05-07T19:45:03.9772836Z #define __UINT_LEAST64_FMTu__ "lu" 2025-05-07T19:45:03.9773144Z #define __UINT_LEAST64_FMTx__ "lx" 2025-05-07T19:45:03.9773500Z #define __UINT_LEAST64_MAX__ 18446744073709551615UL 2025-05-07T19:45:03.9773893Z #define __UINT_LEAST64_TYPE__ long unsigned int 2025-05-07T19:45:03.9774276Z #define __UINT_LEAST8_FMTX__ "hhX" 2025-05-07T19:45:03.9774584Z #define __UINT_LEAST8_FMTo__ "hho" 2025-05-07T19:45:03.9774916Z #define __UINT_LEAST8_FMTu__ "hhu" 2025-05-07T19:45:03.9775228Z #define __UINT_LEAST8_FMTx__ "hhx" 2025-05-07T19:45:03.9775559Z #define __UINT_LEAST8_MAX__ 255 2025-05-07T19:45:03.9775867Z #define __UINT_LEAST8_TYPE__ unsigned char 2025-05-07T19:45:03.9776225Z #define __USER_LABEL_PREFIX__ 2025-05-07T19:45:03.9776919Z #define __VERSION__ "Clang 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4)" 2025-05-07T19:45:03.9777659Z #define __WCHAR_MAX__ 2147483647 2025-05-07T19:45:03.9777995Z #define __WCHAR_TYPE__ int 2025-05-07T19:45:03.9778270Z #define __WCHAR_WIDTH__ 32 2025-05-07T19:45:03.9778577Z #define __WINT_MAX__ 4294967295U 2025-05-07T19:45:03.9778875Z #define __WINT_TYPE__ unsigned int 2025-05-07T19:45:03.9779367Z #define __WINT_UNSIGNED__ 1 2025-05-07T19:45:03.9779657Z #define __WINT_WIDTH__ 32 2025-05-07T19:45:03.9779960Z #define __amd64 1 2025-05-07T19:45:03.9780235Z #define __amd64__ 1 2025-05-07T19:45:03.9780481Z #define __clang__ 1 2025-05-07T19:45:03.9780786Z #define __clang_literal_encoding__ "UTF-8" 2025-05-07T19:45:03.9781212Z #define __clang_major__ 16 2025-05-07T19:45:03.9781532Z #define __clang_minor__ 0 2025-05-07T19:45:03.9781819Z #define __clang_patchlevel__ 6 2025-05-07T19:45:03.9782495Z #define __clang_version__ "16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4)" 2025-05-07T19:45:03.9783193Z #define __clang_wide_literal_encoding__ "UTF-32" 2025-05-07T19:45:03.9783585Z #define __code_model_small__ 1 2025-05-07T19:45:03.9783914Z #define __cplusplus 201703L 2025-05-07T19:45:03.9784216Z #define __cpp_aggregate_bases 201603L 2025-05-07T19:45:03.9784581Z #define __cpp_aggregate_nsdmi 201304L 2025-05-07T19:45:03.9784913Z #define __cpp_alias_templates 200704L 2025-05-07T19:45:03.9785271Z #define __cpp_aligned_new 201606L 2025-05-07T19:45:03.9785584Z #define __cpp_attributes 200809L 2025-05-07T19:45:03.9785926Z #define __cpp_binary_literals 201304L 2025-05-07T19:45:03.9786260Z #define __cpp_capture_star_this 201603L 2025-05-07T19:45:03.9786621Z #define __cpp_constexpr 201603L 2025-05-07T19:45:03.9786949Z #define __cpp_constexpr_in_decltype 201711L 2025-05-07T19:45:03.9787322Z #define __cpp_decltype 200707L 2025-05-07T19:45:03.9787650Z #define __cpp_decltype_auto 201304L 2025-05-07T19:45:03.9787972Z #define __cpp_deduction_guides 201703L 2025-05-07T19:45:03.9788352Z #define __cpp_delegating_constructors 200604L 2025-05-07T19:45:03.9788710Z #define __cpp_digit_separators 201309L 2025-05-07T19:45:03.9789185Z #define __cpp_enumerator_attributes 201411L 2025-05-07T19:45:03.9789505Z #define __cpp_exceptions 199711L 2025-05-07T19:45:03.9789825Z #define __cpp_fold_expressions 201603L 2025-05-07T19:45:03.9790134Z #define __cpp_generic_lambdas 201304L 2025-05-07T19:45:03.9790489Z #define __cpp_guaranteed_copy_elision 201606L 2025-05-07T19:45:03.9790817Z #define __cpp_hex_float 201603L 2025-05-07T19:45:03.9791132Z #define __cpp_if_constexpr 201606L 2025-05-07T19:45:03.9791480Z #define __cpp_impl_destroying_delete 201806L 2025-05-07T19:45:03.9791830Z #define __cpp_inheriting_constructors 201511L 2025-05-07T19:45:03.9792191Z #define __cpp_init_captures 201304L 2025-05-07T19:45:03.9792502Z #define __cpp_initializer_lists 200806L 2025-05-07T19:45:03.9792858Z #define __cpp_inline_variables 201606L 2025-05-07T19:45:03.9793165Z #define __cpp_lambdas 200907L 2025-05-07T19:45:03.9793498Z #define __cpp_named_character_escapes 202207L 2025-05-07T19:45:03.9793846Z #define __cpp_namespace_attributes 201411L 2025-05-07T19:45:03.9794233Z #define __cpp_nested_namespace_definitions 201411L 2025-05-07T19:45:03.9794625Z #define __cpp_noexcept_function_type 201510L 2025-05-07T19:45:03.9794965Z #define __cpp_nontype_template_args 201411L 2025-05-07T19:45:03.9795352Z #define __cpp_nontype_template_parameter_auto 201606L 2025-05-07T19:45:03.9795705Z #define __cpp_nsdmi 200809L 2025-05-07T19:45:03.9796012Z #define __cpp_range_based_for 201603L 2025-05-07T19:45:03.9796313Z #define __cpp_raw_strings 200710L 2025-05-07T19:45:03.9796633Z #define __cpp_ref_qualifiers 200710L 2025-05-07T19:45:03.9796953Z #define __cpp_return_type_deduction 201304L 2025-05-07T19:45:03.9797297Z #define __cpp_rtti 199711L 2025-05-07T19:45:03.9797601Z #define __cpp_rvalue_references 200610L 2025-05-07T19:45:03.9797912Z #define __cpp_static_assert 201411L 2025-05-07T19:45:03.9798251Z #define __cpp_static_call_operator 202207L 2025-05-07T19:45:03.9798583Z #define __cpp_structured_bindings 201606L 2025-05-07T19:45:03.9798934Z #define __cpp_template_auto 201606L 2025-05-07T19:45:03.9799253Z #define __cpp_threadsafe_static_init 200806L 2025-05-07T19:45:03.9799617Z #define __cpp_unicode_characters 200704L 2025-05-07T19:45:03.9799940Z #define __cpp_unicode_literals 200710L 2025-05-07T19:45:03.9800292Z #define __cpp_user_defined_literals 200809L 2025-05-07T19:45:03.9800713Z #define __cpp_variable_templates 201304L 2025-05-07T19:45:03.9801069Z #define __cpp_variadic_templates 200704L 2025-05-07T19:45:03.9801427Z #define __cpp_variadic_using 201611L 2025-05-07T19:45:03.9801725Z #define __gnu_linux__ 1 2025-05-07T19:45:03.9802012Z #define __k8 1 2025-05-07T19:45:03.9802297Z #define __k8__ 1 2025-05-07T19:45:03.9802554Z #define __linux 1 2025-05-07T19:45:03.9802781Z #define __linux__ 1 2025-05-07T19:45:03.9803038Z #define __llvm__ 1 2025-05-07T19:45:03.9803264Z #define __pic__ 2 2025-05-07T19:45:03.9803528Z #define __pie__ 2 2025-05-07T19:45:03.9803775Z #define __private_extern__ extern 2025-05-07T19:45:03.9804142Z #define __seg_fs __attribute__((address_space(257))) 2025-05-07T19:45:03.9804552Z #define __seg_gs __attribute__((address_space(256))) 2025-05-07T19:45:03.9804885Z #define __tune_k8__ 1 2025-05-07T19:45:03.9805146Z #define __unix 1 2025-05-07T19:45:03.9805367Z #define __unix__ 1 2025-05-07T19:45:03.9805617Z #define __x86_64 1 2025-05-07T19:45:03.9805839Z #define __x86_64__ 1 2025-05-07T19:45:03.9806092Z #define linux 1 2025-05-07T19:45:03.9806315Z #define unix 1 2025-05-07T19:45:03.9806475Z 2025-05-07T19:45:04.0409711Z 2025-05-07T19:45:04.0410676Z + conda run -n build_binary c++ --version 2025-05-07T19:45:04.0411050Z 2025-05-07T19:45:05.8162820Z clang version 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4) 2025-05-07T19:45:05.8164729Z Target: x86_64-conda-linux-gnu 2025-05-07T19:45:05.8165559Z Thread model: posix 2025-05-07T19:45:05.8166479Z InstalledDir: /github/home/miniconda/envs/build_binary/bin 2025-05-07T19:45:05.8168056Z Configuration file: /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-clang++.cfg 2025-05-07T19:45:05.8168536Z 2025-05-07T19:45:05.8727234Z 2025-05-07T19:45:05.8727820Z [INFO] Printing the default version of the C standard used by the compiler ... 2025-05-07T19:45:05.8728429Z + conda run -n build_binary cc -dM -E - < /dev/null | grep __STDC_VERSION__ 2025-05-07T19:45:05.8728797Z 2025-05-07T19:45:07.7534389Z #define __STDC_VERSION__ 201710L 2025-05-07T19:45:07.7535055Z 2025-05-07T19:45:07.7535849Z [INFO] Printing the default version of the C++ standard used by the compiler ... 2025-05-07T19:45:07.7537948Z + conda run -n build_binary c++ -dM -E -x c++ - < /dev/null | grep __cplusplus 2025-05-07T19:45:07.7538880Z 2025-05-07T19:45:09.6468457Z #define __cplusplus 201703L 2025-05-07T19:45:09.6469115Z 2025-05-07T19:45:09.6469568Z [INSTALL] Successfully installed C/C++ compilers 2025-05-07T19:45:09.6547455Z ##[group]Run . $PRELUDE; install_build_tools $BUILD_ENV 2025-05-07T19:45:09.6547977Z . $PRELUDE; install_build_tools $BUILD_ENV 2025-05-07T19:45:09.6548908Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:45:09.6549289Z env: 2025-05-07T19:45:09.6549567Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:45:09.6550023Z BUILD_ENV: build_binary 2025-05-07T19:45:09.6550320Z BUILD_TARGET: genai 2025-05-07T19:45:09.6550597Z BUILD_VARIANT: cuda 2025-05-07T19:45:09.6550881Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:45:09.6551154Z ##[endgroup] 2025-05-07T19:45:10.0875660Z ################################################################################ 2025-05-07T19:45:10.0876724Z # Install Build Tools 2025-05-07T19:45:10.0877420Z # 2025-05-07T19:45:10.0892279Z # [2025-05-07T19:45:10.088Z] + install_build_tools build_binary 2025-05-07T19:45:10.0893542Z ################################################################################ 2025-05-07T19:45:10.0894366Z 2025-05-07T19:45:10.0916837Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:45:10.1755951Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:45:10.1758381Z [INSTALL] Installing build tools ... 2025-05-07T19:45:10.1783235Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y auditwheel bazel cmake>=3.30 hypothesis jinja2 make ncurses ninja openblas patchelf rhash scikit-build wheel pyyaml 2025-05-07T19:45:10.9026474Z Channels: 2025-05-07T19:45:10.9027210Z - conda-forge 2025-05-07T19:45:10.9027910Z Platform: linux-64 2025-05-07T19:45:14.0156905Z Collecting package metadata (repodata.json): - \ | / done 2025-05-07T19:45:17.7679270Z Solving environment: \ | / - done 2025-05-07T19:45:17.8287726Z 2025-05-07T19:45:17.8288116Z ## Package Plan ## 2025-05-07T19:45:17.8288295Z 2025-05-07T19:45:17.8288669Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:45:17.8289058Z 2025-05-07T19:45:17.8289244Z added / updated specs: 2025-05-07T19:45:17.8289522Z - auditwheel 2025-05-07T19:45:17.8289806Z - bazel 2025-05-07T19:45:17.8290042Z - cmake[version='>=3.30'] 2025-05-07T19:45:17.8290530Z - hypothesis 2025-05-07T19:45:17.8290767Z - jinja2 2025-05-07T19:45:17.8291019Z - make 2025-05-07T19:45:17.8291237Z - ncurses 2025-05-07T19:45:17.8291483Z - ninja 2025-05-07T19:45:17.8291709Z - openblas 2025-05-07T19:45:17.8292099Z - patchelf 2025-05-07T19:45:17.8292325Z - pyyaml 2025-05-07T19:45:17.8292573Z - rhash 2025-05-07T19:45:17.8292793Z - scikit-build 2025-05-07T19:45:17.8293066Z - wheel 2025-05-07T19:45:17.8293191Z 2025-05-07T19:45:17.8293195Z 2025-05-07T19:45:17.8293355Z The following packages will be downloaded: 2025-05-07T19:45:17.8293596Z 2025-05-07T19:45:17.8293728Z package | build 2025-05-07T19:45:17.8294120Z ---------------------------|----------------- 2025-05-07T19:45:17.8294648Z alsa-lib-1.2.14 | hb9d3cd8_0 553 KB conda-forge 2025-05-07T19:45:17.8295119Z attrs-25.3.0 | pyh71513ae_0 56 KB conda-forge 2025-05-07T19:45:17.8295584Z auditwheel-6.2.0 | pyha804496_1 40 KB conda-forge 2025-05-07T19:45:17.8296026Z bazel-7.5.0 | h96810dc_2 47.4 MB conda-forge 2025-05-07T19:45:17.8296468Z bzip2-1.0.8 | h4bc722e_7 247 KB conda-forge 2025-05-07T19:45:17.8296893Z c-ares-1.34.5 | hb9d3cd8_0 202 KB conda-forge 2025-05-07T19:45:17.8297656Z cairo-1.18.0 | hbb29018_2 961 KB conda-forge 2025-05-07T19:45:17.8298176Z click-8.1.8 | pyh707e725_0 83 KB conda-forge 2025-05-07T19:45:17.8298653Z cmake-4.0.2 | h74e3db0_0 19.4 MB conda-forge 2025-05-07T19:45:17.8299498Z distro-1.9.0 | pyhd8ed1ab_1 41 KB conda-forge 2025-05-07T19:45:17.8300007Z exceptiongroup-1.2.2 | pyhd8ed1ab_1 20 KB conda-forge 2025-05-07T19:45:17.8300604Z font-ttf-dejavu-sans-mono-2.37| hab24e00_0 388 KB conda-forge 2025-05-07T19:45:17.8301179Z font-ttf-inconsolata-3.000 | h77eed37_0 94 KB conda-forge 2025-05-07T19:45:17.8301778Z font-ttf-source-code-pro-2.038| h77eed37_0 684 KB conda-forge 2025-05-07T19:45:17.8302335Z font-ttf-ubuntu-0.83 | h77eed37_3 1.5 MB conda-forge 2025-05-07T19:45:17.8302848Z fontconfig-2.15.0 | h7e30c49_1 259 KB conda-forge 2025-05-07T19:45:17.8303380Z fonts-conda-ecosystem-1 | 0 4 KB conda-forge 2025-05-07T19:45:17.8304006Z fonts-conda-forge-1 | 0 4 KB conda-forge 2025-05-07T19:45:17.8304538Z freetype-2.13.3 | ha770c72_1 168 KB conda-forge 2025-05-07T19:45:17.8304981Z giflib-5.2.2 | hd590300_0 75 KB conda-forge 2025-05-07T19:45:17.8305433Z graphite2-1.3.13 | h59595ed_1003 95 KB conda-forge 2025-05-07T19:45:17.8305869Z harfbuzz-9.0.0 | hfac3d4d_0 1.5 MB conda-forge 2025-05-07T19:45:17.8306341Z hypothesis-6.131.14 | pyha770c72_0 348 KB conda-forge 2025-05-07T19:45:17.8306769Z ijar-7.5.0 | h5888daf_0 114 KB conda-forge 2025-05-07T19:45:17.8307349Z jinja2-3.1.6 | pyhd8ed1ab_0 110 KB conda-forge 2025-05-07T19:45:17.8307775Z keyutils-1.6.1 | h166bdaf_0 115 KB conda-forge 2025-05-07T19:45:17.8308211Z krb5-1.21.3 | h659f571_0 1.3 MB conda-forge 2025-05-07T19:45:17.8308644Z lcms2-2.17 | h717163a_0 242 KB conda-forge 2025-05-07T19:45:17.8309047Z lerc-4.0.0 | h0aef613_1 258 KB conda-forge 2025-05-07T19:45:17.8309537Z libabseil-20250127.1 | cxx17_hbbce691_0 1.3 MB conda-forge 2025-05-07T19:45:17.8309994Z libcups-2.3.3 | h4637d8d_4 4.3 MB conda-forge 2025-05-07T19:45:17.8310440Z libcurl-8.13.0 | h332b0f4_0 428 KB conda-forge 2025-05-07T19:45:17.8310868Z libdeflate-1.23 | h86f0d12_0 71 KB conda-forge 2025-05-07T19:45:17.8311357Z libedit-3.1.20250104 | pl5321h7949ede_0 132 KB conda-forge 2025-05-07T19:45:17.8311825Z libev-4.33 | hd590300_2 110 KB conda-forge 2025-05-07T19:45:17.8312242Z libexpat-2.7.0 | h5888daf_0 73 KB conda-forge 2025-05-07T19:45:17.8312712Z libfreetype-2.13.3 | ha770c72_1 8 KB conda-forge 2025-05-07T19:45:17.8313172Z libfreetype6-2.13.3 | h48d6fc4_1 371 KB conda-forge 2025-05-07T19:45:17.8313658Z libgfortran-15.1.0 | h69a702a_2 34 KB conda-forge 2025-05-07T19:45:17.8314142Z libgfortran5-15.1.0 | hcea5267_2 1.5 MB conda-forge 2025-05-07T19:45:17.8314586Z libglib-2.84.0 | h2ff4ddf_0 3.8 MB conda-forge 2025-05-07T19:45:17.8315032Z libgrpc-1.71.0 | h8e591d7_1 7.6 MB conda-forge 2025-05-07T19:45:17.8315476Z libjpeg-turbo-3.1.0 | hb9d3cd8_0 614 KB conda-forge 2025-05-07T19:45:17.8315944Z liblzma-5.8.1 | hb9d3cd8_1 110 KB conda-forge 2025-05-07T19:45:17.8316380Z liblzma-devel-5.8.1 | hb9d3cd8_1 431 KB conda-forge 2025-05-07T19:45:17.8316860Z libnghttp2-1.64.0 | h161d5f1_0 632 KB conda-forge 2025-05-07T19:45:17.8317312Z libnsl-2.0.1 | hd590300_0 33 KB conda-forge 2025-05-07T19:45:17.8317762Z libopenblas-0.3.29 |pthreads_h94d23a6_0 5.6 MB conda-forge 2025-05-07T19:45:17.8318345Z libpng-1.6.47 | h943b412_0 282 KB conda-forge 2025-05-07T19:45:17.8318783Z libprotobuf-5.29.3 | h501fc15_1 3.2 MB conda-forge 2025-05-07T19:45:17.8319257Z libre2-11-2024.07.02 | hba17884_3 205 KB conda-forge 2025-05-07T19:45:17.8319704Z libsqlite-3.49.2 | hee588c1_0 895 KB conda-forge 2025-05-07T19:45:17.8320164Z libssh2-1.11.1 | hcf80075_0 298 KB conda-forge 2025-05-07T19:45:17.8320613Z libtiff-4.7.0 | hd9ff511_4 419 KB conda-forge 2025-05-07T19:45:17.8321034Z libuuid-2.38.1 | h0b41bf4_0 33 KB conda-forge 2025-05-07T19:45:17.8321477Z libuv-1.50.0 | hb9d3cd8_0 870 KB conda-forge 2025-05-07T19:45:17.8321907Z libwebp-base-1.5.0 | h851e524_0 420 KB conda-forge 2025-05-07T19:45:17.8322365Z libxcb-1.17.0 | h8a09558_0 387 KB conda-forge 2025-05-07T19:45:17.8322802Z libzlib-1.3.1 | hb9d3cd8_2 60 KB conda-forge 2025-05-07T19:45:17.8323205Z make-4.4.1 | hb9d3cd8_2 501 KB conda-forge 2025-05-07T19:45:17.8323657Z markupsafe-3.0.2 | py39h9399b63_1 22 KB conda-forge 2025-05-07T19:45:17.8324088Z ncurses-6.5 | h2d0b736_3 871 KB conda-forge 2025-05-07T19:45:17.8324598Z ninja-1.12.1 | hff21bea_1 158 KB conda-forge 2025-05-07T19:45:17.8325033Z openblas-0.3.29 |pthreads_h6ec200e_0 5.8 MB conda-forge 2025-05-07T19:45:17.8325506Z openjdk-23.0.1 | h4c11d01_0 181.3 MB conda-forge 2025-05-07T19:45:17.8325960Z packaging-25.0 | pyh29332c3_1 61 KB conda-forge 2025-05-07T19:45:17.8326396Z patchelf-0.18.0 | h3f2d84a_2 133 KB conda-forge 2025-05-07T19:45:17.8326835Z pcre2-10.44 | hc749103_2 934 KB conda-forge 2025-05-07T19:45:17.8327226Z pixman-0.46.0 | h29eaf8c_0 389 KB conda-forge 2025-05-07T19:45:17.8327664Z pthread-stubs-0.4 | hb9d3cd8_1002 8 KB conda-forge 2025-05-07T19:45:17.8328114Z pyelftools-0.32 | pyh707e725_1 146 KB conda-forge 2025-05-07T19:45:17.8328590Z python-3.9.22 |h85ef794_1_cpython 22.5 MB conda-forge 2025-05-07T19:45:17.8329043Z pyyaml-6.0.2 | py39h9399b63_2 178 KB conda-forge 2025-05-07T19:45:17.8329456Z re2-2024.07.02 | h9925aae_3 26 KB conda-forge 2025-05-07T19:45:17.8329891Z rhash-1.4.5 | hb9d3cd8_0 183 KB conda-forge 2025-05-07T19:45:17.8330328Z scikit-build-0.18.1 | pyhae55e72_2 114 KB conda-forge 2025-05-07T19:45:17.8330815Z singlejar-7.5.0 | h0e684df_1 122 KB conda-forge 2025-05-07T19:45:17.8331310Z sortedcontainers-2.4.0 | pyhd8ed1ab_1 28 KB conda-forge 2025-05-07T19:45:17.8331778Z sqlite-3.49.2 | h9eae976_0 840 KB conda-forge 2025-05-07T19:45:17.8332208Z tk-8.6.13 |noxft_h4845f30_101 3.2 MB conda-forge 2025-05-07T19:45:17.8332617Z tomli-2.2.1 | pyhd8ed1ab_1 19 KB conda-forge 2025-05-07T19:45:17.8333077Z wheel-0.45.1 | pyhd8ed1ab_1 61 KB conda-forge 2025-05-07T19:45:17.8333491Z xorg-libice-1.1.2 | hb9d3cd8_0 57 KB conda-forge 2025-05-07T19:45:17.8333935Z xorg-libsm-1.2.6 | he73a12e_0 27 KB conda-forge 2025-05-07T19:45:17.8334385Z xorg-libx11-1.8.12 | h4f16b4b_0 816 KB conda-forge 2025-05-07T19:45:17.8334821Z xorg-libxau-1.0.12 | hb9d3cd8_0 14 KB conda-forge 2025-05-07T19:45:17.8335359Z xorg-libxdmcp-1.1.5 | hb9d3cd8_0 19 KB conda-forge 2025-05-07T19:45:17.8335804Z xorg-libxext-1.3.6 | hb9d3cd8_0 49 KB conda-forge 2025-05-07T19:45:17.8336278Z xorg-libxfixes-6.0.1 | hb9d3cd8_0 19 KB conda-forge 2025-05-07T19:45:17.8336718Z xorg-libxi-1.8.2 | hb9d3cd8_0 46 KB conda-forge 2025-05-07T19:45:17.8337289Z xorg-libxrandr-1.5.4 | hb9d3cd8_0 29 KB conda-forge 2025-05-07T19:45:17.8337994Z xorg-libxrender-0.9.12 | hb9d3cd8_0 32 KB conda-forge 2025-05-07T19:45:17.8338551Z xorg-libxt-1.3.1 | hb9d3cd8_0 371 KB conda-forge 2025-05-07T19:45:17.8339038Z xorg-libxtst-1.2.5 | hb9d3cd8_3 32 KB conda-forge 2025-05-07T19:45:17.8339470Z xz-5.8.1 | hbcc6ac9_1 23 KB conda-forge 2025-05-07T19:45:17.8339928Z xz-gpl-tools-5.8.1 | hbcc6ac9_1 33 KB conda-forge 2025-05-07T19:45:17.8340402Z xz-tools-5.8.1 | hb9d3cd8_1 94 KB conda-forge 2025-05-07T19:45:17.8340822Z yaml-0.2.5 | h7f98852_2 87 KB conda-forge 2025-05-07T19:45:17.8341252Z zlib-1.3.1 | hb9d3cd8_2 90 KB conda-forge 2025-05-07T19:45:17.8341656Z zstd-1.5.7 | hb8e6e7a_2 554 KB conda-forge 2025-05-07T19:45:17.8342164Z ------------------------------------------------------------ 2025-05-07T19:45:17.8342530Z Total: 330.1 MB 2025-05-07T19:45:17.8342783Z 2025-05-07T19:45:17.8342920Z The following NEW packages will be INSTALLED: 2025-05-07T19:45:17.8343155Z 2025-05-07T19:45:17.8343381Z alsa-lib conda-forge/linux-64::alsa-lib-1.2.14-hb9d3cd8_0 2025-05-07T19:45:17.8343842Z attrs conda-forge/noarch::attrs-25.3.0-pyh71513ae_0 2025-05-07T19:45:17.8344339Z auditwheel conda-forge/noarch::auditwheel-6.2.0-pyha804496_1 2025-05-07T19:45:17.8344809Z bazel conda-forge/linux-64::bazel-7.5.0-h96810dc_2 2025-05-07T19:45:17.8345251Z bzip2 conda-forge/linux-64::bzip2-1.0.8-h4bc722e_7 2025-05-07T19:45:17.8345702Z c-ares conda-forge/linux-64::c-ares-1.34.5-hb9d3cd8_0 2025-05-07T19:45:17.8346140Z cairo conda-forge/linux-64::cairo-1.18.0-hbb29018_2 2025-05-07T19:45:17.8346589Z click conda-forge/noarch::click-8.1.8-pyh707e725_0 2025-05-07T19:45:17.8347020Z cmake conda-forge/linux-64::cmake-4.0.2-h74e3db0_0 2025-05-07T19:45:17.8347472Z distro conda-forge/noarch::distro-1.9.0-pyhd8ed1ab_1 2025-05-07T19:45:17.8348008Z exceptiongroup conda-forge/noarch::exceptiongroup-1.2.2-pyhd8ed1ab_1 2025-05-07T19:45:17.8348642Z font-ttf-dejavu-s~ conda-forge/noarch::font-ttf-dejavu-sans-mono-2.37-hab24e00_0 2025-05-07T19:45:17.8349315Z font-ttf-inconsol~ conda-forge/noarch::font-ttf-inconsolata-3.000-h77eed37_0 2025-05-07T19:45:17.8350070Z font-ttf-source-c~ conda-forge/noarch::font-ttf-source-code-pro-2.038-h77eed37_0 2025-05-07T19:45:17.8350656Z font-ttf-ubuntu conda-forge/noarch::font-ttf-ubuntu-0.83-h77eed37_3 2025-05-07T19:45:17.8351167Z fontconfig conda-forge/linux-64::fontconfig-2.15.0-h7e30c49_1 2025-05-07T19:45:17.8351656Z fonts-conda-ecosy~ conda-forge/noarch::fonts-conda-ecosystem-1-0 2025-05-07T19:45:17.8352165Z fonts-conda-forge conda-forge/noarch::fonts-conda-forge-1-0 2025-05-07T19:45:17.8352630Z freetype conda-forge/linux-64::freetype-2.13.3-ha770c72_1 2025-05-07T19:45:17.8353093Z giflib conda-forge/linux-64::giflib-5.2.2-hd590300_0 2025-05-07T19:45:17.8353585Z graphite2 conda-forge/linux-64::graphite2-1.3.13-h59595ed_1003 2025-05-07T19:45:17.8354061Z harfbuzz conda-forge/linux-64::harfbuzz-9.0.0-hfac3d4d_0 2025-05-07T19:45:17.8354567Z hypothesis conda-forge/noarch::hypothesis-6.131.14-pyha770c72_0 2025-05-07T19:45:17.8355111Z ijar conda-forge/linux-64::ijar-7.5.0-h5888daf_0 2025-05-07T19:45:17.8355556Z jinja2 conda-forge/noarch::jinja2-3.1.6-pyhd8ed1ab_0 2025-05-07T19:45:17.8356001Z keyutils conda-forge/linux-64::keyutils-1.6.1-h166bdaf_0 2025-05-07T19:45:17.8356455Z krb5 conda-forge/linux-64::krb5-1.21.3-h659f571_0 2025-05-07T19:45:17.8356871Z lcms2 conda-forge/linux-64::lcms2-2.17-h717163a_0 2025-05-07T19:45:17.8357266Z lerc conda-forge/linux-64::lerc-4.0.0-h0aef613_1 2025-05-07T19:45:17.8357744Z libabseil conda-forge/linux-64::libabseil-20250127.1-cxx17_hbbce691_0 2025-05-07T19:45:17.8358229Z libcups conda-forge/linux-64::libcups-2.3.3-h4637d8d_4 2025-05-07T19:45:17.8358679Z libcurl conda-forge/linux-64::libcurl-8.13.0-h332b0f4_0 2025-05-07T19:45:17.8359140Z libdeflate conda-forge/linux-64::libdeflate-1.23-h86f0d12_0 2025-05-07T19:45:17.8359626Z libedit conda-forge/linux-64::libedit-3.1.20250104-pl5321h7949ede_0 2025-05-07T19:45:17.8360093Z libev conda-forge/linux-64::libev-4.33-hd590300_2 2025-05-07T19:45:17.8360518Z libexpat conda-forge/linux-64::libexpat-2.7.0-h5888daf_0 2025-05-07T19:45:17.8360999Z libfreetype conda-forge/linux-64::libfreetype-2.13.3-ha770c72_1 2025-05-07T19:45:17.8361512Z libfreetype6 conda-forge/linux-64::libfreetype6-2.13.3-h48d6fc4_1 2025-05-07T19:45:17.8362102Z libgfortran conda-forge/linux-64::libgfortran-15.1.0-h69a702a_2 2025-05-07T19:45:17.8362618Z libgfortran5 conda-forge/linux-64::libgfortran5-15.1.0-hcea5267_2 2025-05-07T19:45:17.8363089Z libglib conda-forge/linux-64::libglib-2.84.0-h2ff4ddf_0 2025-05-07T19:45:17.8363540Z libgrpc conda-forge/linux-64::libgrpc-1.71.0-h8e591d7_1 2025-05-07T19:45:17.8364031Z libjpeg-turbo conda-forge/linux-64::libjpeg-turbo-3.1.0-hb9d3cd8_0 2025-05-07T19:45:17.8364510Z liblzma conda-forge/linux-64::liblzma-5.8.1-hb9d3cd8_1 2025-05-07T19:45:17.8364995Z liblzma-devel conda-forge/linux-64::liblzma-devel-5.8.1-hb9d3cd8_1 2025-05-07T19:45:17.8365493Z libnghttp2 conda-forge/linux-64::libnghttp2-1.64.0-h161d5f1_0 2025-05-07T19:45:17.8365960Z libnsl conda-forge/linux-64::libnsl-2.0.1-hd590300_0 2025-05-07T19:45:17.8366471Z libopenblas conda-forge/linux-64::libopenblas-0.3.29-pthreads_h94d23a6_0 2025-05-07T19:45:17.8366975Z libpng conda-forge/linux-64::libpng-1.6.47-h943b412_0 2025-05-07T19:45:17.8367862Z libprotobuf conda-forge/linux-64::libprotobuf-5.29.3-h501fc15_1 2025-05-07T19:45:17.8368380Z libre2-11 conda-forge/linux-64::libre2-11-2024.07.02-hba17884_3 2025-05-07T19:45:17.8368897Z libsqlite conda-forge/linux-64::libsqlite-3.49.2-hee588c1_0 2025-05-07T19:45:17.8369389Z libssh2 conda-forge/linux-64::libssh2-1.11.1-hcf80075_0 2025-05-07T19:45:17.8369851Z libtiff conda-forge/linux-64::libtiff-4.7.0-hd9ff511_4 2025-05-07T19:45:17.8370326Z libuuid conda-forge/linux-64::libuuid-2.38.1-h0b41bf4_0 2025-05-07T19:45:17.8370766Z libuv conda-forge/linux-64::libuv-1.50.0-hb9d3cd8_0 2025-05-07T19:45:17.8371264Z libwebp-base conda-forge/linux-64::libwebp-base-1.5.0-h851e524_0 2025-05-07T19:45:17.8371769Z libxcb conda-forge/linux-64::libxcb-1.17.0-h8a09558_0 2025-05-07T19:45:17.8372201Z make conda-forge/linux-64::make-4.4.1-hb9d3cd8_2 2025-05-07T19:45:17.8372694Z markupsafe conda-forge/linux-64::markupsafe-3.0.2-py39h9399b63_1 2025-05-07T19:45:17.8373178Z ninja conda-forge/linux-64::ninja-1.12.1-hff21bea_1 2025-05-07T19:45:17.8373685Z openblas conda-forge/linux-64::openblas-0.3.29-pthreads_h6ec200e_0 2025-05-07T19:45:17.8374194Z openjdk conda-forge/linux-64::openjdk-23.0.1-h4c11d01_0 2025-05-07T19:45:17.8374828Z packaging conda-forge/noarch::packaging-25.0-pyh29332c3_1 2025-05-07T19:45:17.8375336Z patchelf conda-forge/linux-64::patchelf-0.18.0-h3f2d84a_2 2025-05-07T19:45:17.8375795Z pcre2 conda-forge/linux-64::pcre2-10.44-hc749103_2 2025-05-07T19:45:17.8376254Z pixman conda-forge/linux-64::pixman-0.46.0-h29eaf8c_0 2025-05-07T19:45:17.8376764Z pthread-stubs conda-forge/linux-64::pthread-stubs-0.4-hb9d3cd8_1002 2025-05-07T19:45:17.8377417Z pyelftools conda-forge/noarch::pyelftools-0.32-pyh707e725_1 2025-05-07T19:45:17.8377927Z pyyaml conda-forge/linux-64::pyyaml-6.0.2-py39h9399b63_2 2025-05-07T19:45:17.8378368Z re2 conda-forge/linux-64::re2-2024.07.02-h9925aae_3 2025-05-07T19:45:17.8378805Z rhash conda-forge/linux-64::rhash-1.4.5-hb9d3cd8_0 2025-05-07T19:45:17.8379294Z scikit-build conda-forge/noarch::scikit-build-0.18.1-pyhae55e72_2 2025-05-07T19:45:17.8379830Z singlejar conda-forge/linux-64::singlejar-7.5.0-h0e684df_1 2025-05-07T19:45:17.8380401Z sortedcontainers conda-forge/noarch::sortedcontainers-2.4.0-pyhd8ed1ab_1 2025-05-07T19:45:17.8380931Z tomli conda-forge/noarch::tomli-2.2.1-pyhd8ed1ab_1 2025-05-07T19:45:17.8381423Z xorg-libice conda-forge/linux-64::xorg-libice-1.1.2-hb9d3cd8_0 2025-05-07T19:45:17.8381927Z xorg-libsm conda-forge/linux-64::xorg-libsm-1.2.6-he73a12e_0 2025-05-07T19:45:17.8382451Z xorg-libx11 conda-forge/linux-64::xorg-libx11-1.8.12-h4f16b4b_0 2025-05-07T19:45:17.8383122Z xorg-libxau conda-forge/linux-64::xorg-libxau-1.0.12-hb9d3cd8_0 2025-05-07T19:45:17.8383659Z xorg-libxdmcp conda-forge/linux-64::xorg-libxdmcp-1.1.5-hb9d3cd8_0 2025-05-07T19:45:17.8384226Z xorg-libxext conda-forge/linux-64::xorg-libxext-1.3.6-hb9d3cd8_0 2025-05-07T19:45:17.8384777Z xorg-libxfixes conda-forge/linux-64::xorg-libxfixes-6.0.1-hb9d3cd8_0 2025-05-07T19:45:17.8385334Z xorg-libxi conda-forge/linux-64::xorg-libxi-1.8.2-hb9d3cd8_0 2025-05-07T19:45:17.8385882Z xorg-libxrandr conda-forge/linux-64::xorg-libxrandr-1.5.4-hb9d3cd8_0 2025-05-07T19:45:17.8386470Z xorg-libxrender conda-forge/linux-64::xorg-libxrender-0.9.12-hb9d3cd8_0 2025-05-07T19:45:17.8387050Z xorg-libxt conda-forge/linux-64::xorg-libxt-1.3.1-hb9d3cd8_0 2025-05-07T19:45:17.8387568Z xorg-libxtst conda-forge/linux-64::xorg-libxtst-1.2.5-hb9d3cd8_3 2025-05-07T19:45:17.8388122Z xz-gpl-tools conda-forge/linux-64::xz-gpl-tools-5.8.1-hbcc6ac9_1 2025-05-07T19:45:17.8388648Z xz-tools conda-forge/linux-64::xz-tools-5.8.1-hb9d3cd8_1 2025-05-07T19:45:17.8389098Z yaml conda-forge/linux-64::yaml-0.2.5-h7f98852_2 2025-05-07T19:45:17.8389364Z 2025-05-07T19:45:17.8389516Z The following packages will be UPDATED: 2025-05-07T19:45:17.8389737Z 2025-05-07T19:45:17.8389900Z libzlib 1.2.13-h4ab18f5_6 --> 1.3.1-hb9d3cd8_2 2025-05-07T19:45:17.8390479Z ncurses pkgs/main::ncurses-6.4-h6a678d5_0 --> conda-forge::ncurses-6.5-h2d0b736_3 2025-05-07T19:45:17.8391189Z python pkgs/main::python-3.9.21-he870216_1 --> conda-forge::python-3.9.22-h85ef794_1_cpython 2025-05-07T19:45:17.8391877Z sqlite pkgs/main::sqlite-3.45.3-h5eee18b_0 --> conda-forge::sqlite-3.49.2-h9eae976_0 2025-05-07T19:45:17.8392589Z wheel pkgs/main/linux-64::wheel-0.45.1-py39~ --> conda-forge/noarch::wheel-0.45.1-pyhd8ed1ab_1 2025-05-07T19:45:17.8393231Z xz pkgs/main::xz-5.6.4-h5eee18b_1 --> conda-forge::xz-5.8.1-hbcc6ac9_1 2025-05-07T19:45:17.8393719Z zlib 1.2.13-h4ab18f5_6 --> 1.3.1-hb9d3cd8_2 2025-05-07T19:45:17.8394136Z zstd 1.5.6-ha6fb4c9_0 --> 1.5.7-hb8e6e7a_2 2025-05-07T19:45:17.8394399Z 2025-05-07T19:45:17.8394632Z The following packages will be SUPERSEDED by a higher-priority channel: 2025-05-07T19:45:17.8394984Z 2025-05-07T19:45:17.8395299Z tk pkgs/main::tk-8.6.14-h39e8969_0 --> conda-forge::tk-8.6.13-noxft_h4845f30_101 2025-05-07T19:45:17.8395655Z 2025-05-07T19:45:17.8395683Z 2025-05-07T19:45:17.8395688Z 2025-05-07T19:45:17.8395856Z Downloading and Extracting Packages: ...working... 2025-05-07T19:45:17.8396247Z openjdk-23.0.1 | 181.3 MB | | 0% 2025-05-07T19:45:17.8396510Z 2025-05-07T19:45:17.8396836Z bazel-7.5.0 | 47.4 MB | | 0%  2025-05-07T19:45:17.8397085Z 2025-05-07T19:45:17.8397099Z 2025-05-07T19:45:17.8405722Z python-3.9.22 | 22.5 MB | | 0%  2025-05-07T19:45:17.8406480Z 2025-05-07T19:45:17.8406491Z 2025-05-07T19:45:17.8408444Z 2025-05-07T19:45:17.8414766Z cmake-4.0.2 | 19.4 MB | | 0%  2025-05-07T19:45:17.8415531Z 2025-05-07T19:45:17.8415542Z 2025-05-07T19:45:17.8415553Z 2025-05-07T19:45:17.8415563Z 2025-05-07T19:45:17.8434747Z libgrpc-1.71.0 | 7.6 MB | | 0%  2025-05-07T19:45:17.8435709Z 2025-05-07T19:45:17.8435724Z 2025-05-07T19:45:17.8435735Z 2025-05-07T19:45:17.8435745Z 2025-05-07T19:45:17.8435755Z 2025-05-07T19:45:17.8436481Z openblas-0.3.29 | 5.8 MB | | 0%  2025-05-07T19:45:17.8436963Z 2025-05-07T19:45:17.8436966Z 2025-05-07T19:45:17.8436970Z 2025-05-07T19:45:17.8436990Z 2025-05-07T19:45:17.8436993Z 2025-05-07T19:45:17.8436997Z 2025-05-07T19:45:17.8437269Z libopenblas-0.3.29 | 5.6 MB | | 0%  2025-05-07T19:45:17.8437751Z 2025-05-07T19:45:17.8437755Z 2025-05-07T19:45:17.8437759Z 2025-05-07T19:45:17.8437762Z 2025-05-07T19:45:17.8437766Z 2025-05-07T19:45:17.8437785Z 2025-05-07T19:45:17.8437789Z 2025-05-07T19:45:17.8438035Z libcups-2.3.3 | 4.3 MB | | 0%  2025-05-07T19:45:17.8438319Z 2025-05-07T19:45:17.8438323Z 2025-05-07T19:45:17.8438326Z 2025-05-07T19:45:17.8438330Z 2025-05-07T19:45:17.8438333Z 2025-05-07T19:45:17.8438337Z 2025-05-07T19:45:17.8438345Z 2025-05-07T19:45:17.8438348Z 2025-05-07T19:45:17.8438611Z libglib-2.84.0 | 3.8 MB | | 0%  2025-05-07T19:45:17.8438896Z 2025-05-07T19:45:17.8438899Z 2025-05-07T19:45:17.8438903Z 2025-05-07T19:45:17.8438906Z 2025-05-07T19:45:17.8438910Z 2025-05-07T19:45:17.8438913Z 2025-05-07T19:45:17.8438917Z 2025-05-07T19:45:17.8438920Z 2025-05-07T19:45:17.8438924Z 2025-05-07T19:45:17.8439230Z libprotobuf-5.29.3 | 3.2 MB | | 0%  2025-05-07T19:45:17.8439539Z 2025-05-07T19:45:17.8439544Z 2025-05-07T19:45:17.8439548Z 2025-05-07T19:45:17.8439551Z 2025-05-07T19:45:17.8439556Z 2025-05-07T19:45:17.8439559Z 2025-05-07T19:45:17.8439563Z 2025-05-07T19:45:17.8439566Z 2025-05-07T19:45:17.8439570Z 2025-05-07T19:45:17.8439593Z 2025-05-07T19:45:17.8439826Z tk-8.6.13 | 3.2 MB | | 0%  2025-05-07T19:45:17.8440100Z 2025-05-07T19:45:17.8440103Z 2025-05-07T19:45:17.8440111Z 2025-05-07T19:45:17.8440114Z 2025-05-07T19:45:17.8440117Z 2025-05-07T19:45:17.8440121Z 2025-05-07T19:45:17.8440124Z 2025-05-07T19:45:17.8440128Z 2025-05-07T19:45:17.8440131Z 2025-05-07T19:45:17.8440151Z 2025-05-07T19:45:17.8440155Z 2025-05-07T19:45:17.8440454Z font-ttf-ubuntu-0.83 | 1.5 MB | | 0%  2025-05-07T19:45:17.8440777Z 2025-05-07T19:45:17.8440781Z 2025-05-07T19:45:17.8440785Z 2025-05-07T19:45:17.8440792Z 2025-05-07T19:45:17.8440796Z 2025-05-07T19:45:17.8440799Z 2025-05-07T19:45:17.8440820Z 2025-05-07T19:45:17.8440824Z 2025-05-07T19:45:17.8440827Z 2025-05-07T19:45:17.8440831Z 2025-05-07T19:45:17.8440834Z 2025-05-07T19:45:17.8440837Z 2025-05-07T19:45:17.8441103Z harfbuzz-9.0.0 | 1.5 MB | | 0%  2025-05-07T19:45:17.8441419Z 2025-05-07T19:45:17.8441423Z 2025-05-07T19:45:17.8441441Z 2025-05-07T19:45:17.8441445Z 2025-05-07T19:45:17.8441448Z 2025-05-07T19:45:17.8441552Z 2025-05-07T19:45:17.8441557Z 2025-05-07T19:45:17.8441561Z 2025-05-07T19:45:17.8441564Z 2025-05-07T19:45:17.8441568Z 2025-05-07T19:45:17.8441571Z 2025-05-07T19:45:17.8441574Z 2025-05-07T19:45:17.8441578Z 2025-05-07T19:45:17.8441879Z libgfortran5-15.1.0 | 1.5 MB | | 0%  2025-05-07T19:45:17.8442226Z 2025-05-07T19:45:17.8442230Z 2025-05-07T19:45:17.8442234Z 2025-05-07T19:45:17.8442237Z 2025-05-07T19:45:17.8442245Z 2025-05-07T19:45:17.8442248Z 2025-05-07T19:45:17.8442252Z 2025-05-07T19:45:17.8442255Z 2025-05-07T19:45:17.8442259Z 2025-05-07T19:45:17.8442262Z 2025-05-07T19:45:17.8442265Z 2025-05-07T19:45:17.8442269Z 2025-05-07T19:45:17.8442273Z 2025-05-07T19:45:17.8442276Z 2025-05-07T19:45:17.8442612Z krb5-1.21.3 | 1.3 MB | | 0%  2025-05-07T19:45:17.8442916Z 2025-05-07T19:45:17.8442920Z 2025-05-07T19:45:17.8442924Z 2025-05-07T19:45:17.8442939Z 2025-05-07T19:45:17.8442947Z 2025-05-07T19:45:17.8442951Z 2025-05-07T19:45:17.8442955Z 2025-05-07T19:45:17.8442958Z 2025-05-07T19:45:17.8442962Z 2025-05-07T19:45:17.8442965Z 2025-05-07T19:45:17.8442969Z 2025-05-07T19:45:17.8442972Z 2025-05-07T19:45:17.8442976Z 2025-05-07T19:45:17.8442979Z 2025-05-07T19:45:17.8442983Z 2025-05-07T19:45:17.8444033Z libabseil-20250127.1 | 1.3 MB | | 0%  2025-05-07T19:45:17.8444671Z 2025-05-07T19:45:17.8444676Z 2025-05-07T19:45:17.8444697Z 2025-05-07T19:45:17.8444702Z 2025-05-07T19:45:17.8444705Z 2025-05-07T19:45:17.8444709Z 2025-05-07T19:45:17.8444712Z 2025-05-07T19:45:17.8444716Z 2025-05-07T19:45:17.8444719Z 2025-05-07T19:45:17.8444723Z 2025-05-07T19:45:17.8444727Z 2025-05-07T19:45:17.8444731Z 2025-05-07T19:45:17.8444735Z 2025-05-07T19:45:17.8444738Z 2025-05-07T19:45:17.8444742Z 2025-05-07T19:45:17.8444764Z 2025-05-07T19:45:17.8445086Z cairo-1.18.0 | 961 KB | | 0%  2025-05-07T19:45:17.8445415Z 2025-05-07T19:45:17.8445419Z 2025-05-07T19:45:17.8445423Z 2025-05-07T19:45:17.8445427Z 2025-05-07T19:45:17.8445431Z 2025-05-07T19:45:17.8445435Z 2025-05-07T19:45:17.8445438Z 2025-05-07T19:45:17.8445442Z 2025-05-07T19:45:17.8445445Z 2025-05-07T19:45:17.8445448Z 2025-05-07T19:45:17.8445452Z 2025-05-07T19:45:17.8445455Z 2025-05-07T19:45:17.8445459Z 2025-05-07T19:45:17.8445462Z 2025-05-07T19:45:17.8445471Z 2025-05-07T19:45:17.8445475Z 2025-05-07T19:45:17.8445478Z 2025-05-07T19:45:17.8445801Z pcre2-10.44 | 934 KB | | 0%  2025-05-07T19:45:17.8446111Z 2025-05-07T19:45:17.8446114Z 2025-05-07T19:45:17.8446118Z 2025-05-07T19:45:17.8446122Z 2025-05-07T19:45:17.8446125Z 2025-05-07T19:45:17.8446143Z 2025-05-07T19:45:17.8446146Z 2025-05-07T19:45:17.8446165Z 2025-05-07T19:45:17.8446169Z 2025-05-07T19:45:17.8446172Z 2025-05-07T19:45:17.8446176Z 2025-05-07T19:45:17.8446179Z 2025-05-07T19:45:17.8446188Z 2025-05-07T19:45:17.8446192Z 2025-05-07T19:45:17.8446196Z 2025-05-07T19:45:17.8446199Z 2025-05-07T19:45:17.8446203Z 2025-05-07T19:45:17.8446206Z 2025-05-07T19:45:17.8447693Z libsqlite-3.49.2 | 895 KB | | 0%  2025-05-07T19:45:17.8448047Z 2025-05-07T19:45:17.8448051Z 2025-05-07T19:45:17.8448067Z 2025-05-07T19:45:17.8448071Z 2025-05-07T19:45:17.8448079Z 2025-05-07T19:45:17.8448082Z 2025-05-07T19:45:17.8448086Z 2025-05-07T19:45:17.8448089Z 2025-05-07T19:45:17.8448093Z 2025-05-07T19:45:17.8448097Z 2025-05-07T19:45:17.8448101Z 2025-05-07T19:45:17.8448104Z 2025-05-07T19:45:17.8448107Z 2025-05-07T19:45:17.8448110Z 2025-05-07T19:45:17.8448114Z 2025-05-07T19:45:17.8448117Z 2025-05-07T19:45:17.8448120Z 2025-05-07T19:45:17.8448124Z 2025-05-07T19:45:17.8448127Z 2025-05-07T19:45:18.0384922Z ... (more hidden) ... 2025-05-07T19:45:18.0386059Z 2025-05-07T19:45:18.0386068Z 2025-05-07T19:45:18.0386072Z 2025-05-07T19:45:18.0386076Z 2025-05-07T19:45:18.1461906Z libgrpc-1.71.0 | 7.6 MB | | 1%  2025-05-07T19:45:18.1462790Z 2025-05-07T19:45:18.1462804Z 2025-05-07T19:45:18.1462815Z 2025-05-07T19:45:18.1462825Z 2025-05-07T19:45:18.1889404Z libgrpc-1.71.0 | 7.6 MB | 2 | 2%  2025-05-07T19:45:18.1890313Z 2025-05-07T19:45:18.1890328Z 2025-05-07T19:45:18.2295916Z python-3.9.22 | 22.5 MB | | 0%  2025-05-07T19:45:18.2298779Z openjdk-23.0.1 | 181.3 MB | | 0% 2025-05-07T19:45:18.2299041Z 2025-05-07T19:45:18.2299045Z 2025-05-07T19:45:18.2299049Z 2025-05-07T19:45:18.2364637Z cmake-4.0.2 | 19.4 MB | | 0%  2025-05-07T19:45:18.2364969Z 2025-05-07T19:45:18.2464103Z bazel-7.5.0 | 47.4 MB | | 0%  2025-05-07T19:45:18.2464915Z 2025-05-07T19:45:18.2464928Z 2025-05-07T19:45:18.2464939Z 2025-05-07T19:45:18.2465319Z 2025-05-07T19:45:18.2945413Z libgrpc-1.71.0 | 7.6 MB | #######6 | 77%  2025-05-07T19:45:18.2946317Z 2025-05-07T19:45:18.2946331Z 2025-05-07T19:45:18.2946342Z 2025-05-07T19:45:18.2946353Z 2025-05-07T19:45:18.2969912Z libgrpc-1.71.0 | 7.6 MB | ########## | 100%  2025-05-07T19:45:18.2970779Z 2025-05-07T19:45:18.2970794Z 2025-05-07T19:45:18.3287942Z python-3.9.22 | 22.5 MB | 5 | 6%  2025-05-07T19:45:18.3288504Z 2025-05-07T19:45:18.3288511Z 2025-05-07T19:45:18.3288516Z 2025-05-07T19:45:18.3288523Z 2025-05-07T19:45:18.3288528Z 2025-05-07T19:45:18.3319183Z openblas-0.3.29 | 5.8 MB | | 0%  2025-05-07T19:45:18.3319500Z 2025-05-07T19:45:18.3319505Z 2025-05-07T19:45:18.3319509Z 2025-05-07T19:45:18.3364925Z cmake-4.0.2 | 19.4 MB | ##5 | 26%  2025-05-07T19:45:18.3365371Z openjdk-23.0.1 | 181.3 MB | 2 | 2% 2025-05-07T19:45:18.3365645Z 2025-05-07T19:45:18.3970619Z bazel-7.5.0 | 47.4 MB | #3 | 13%  2025-05-07T19:45:18.3970896Z 2025-05-07T19:45:18.3971047Z 2025-05-07T19:45:18.4290858Z python-3.9.22 | 22.5 MB | #3 | 13%  2025-05-07T19:45:18.4291170Z 2025-05-07T19:45:18.4291300Z 2025-05-07T19:45:18.4291304Z 2025-05-07T19:45:18.4291310Z 2025-05-07T19:45:18.4291394Z 2025-05-07T19:45:18.4319646Z openblas-0.3.29 | 5.8 MB | #########2 | 92%  2025-05-07T19:45:18.4320588Z 2025-05-07T19:45:18.4320602Z 2025-05-07T19:45:18.4320613Z 2025-05-07T19:45:18.4366259Z cmake-4.0.2 | 19.4 MB | #####2 | 52%  2025-05-07T19:45:18.4367084Z 2025-05-07T19:45:18.4368099Z bazel-7.5.0 | 47.4 MB | ##5 | 26%  2025-05-07T19:45:18.5016953Z openjdk-23.0.1 | 181.3 MB | 4 | 4% 2025-05-07T19:45:18.5017513Z 2025-05-07T19:45:18.5017625Z 2025-05-07T19:45:18.5017632Z 2025-05-07T19:45:18.5017652Z 2025-05-07T19:45:18.5017705Z 2025-05-07T19:45:18.5136808Z openblas-0.3.29 | 5.8 MB | ########## | 100%  2025-05-07T19:45:18.5137245Z 2025-05-07T19:45:18.5137250Z 2025-05-07T19:45:18.5318402Z python-3.9.22 | 22.5 MB | ## | 20%  2025-05-07T19:45:18.5318701Z 2025-05-07T19:45:18.5319030Z 2025-05-07T19:45:18.5319034Z 2025-05-07T19:45:18.5373765Z cmake-4.0.2 | 19.4 MB | #########1 | 92%  2025-05-07T19:45:18.5403499Z openjdk-23.0.1 | 181.3 MB | 6 | 6% 2025-05-07T19:45:18.5403913Z 2025-05-07T19:45:18.5404055Z 2025-05-07T19:45:18.5404062Z 2025-05-07T19:45:18.5404084Z 2025-05-07T19:45:18.5404113Z 2025-05-07T19:45:18.5404117Z 2025-05-07T19:45:18.5538420Z libopenblas-0.3.29 | 5.6 MB | | 0%  2025-05-07T19:45:18.5539267Z 2025-05-07T19:45:18.6138393Z bazel-7.5.0 | 47.4 MB | ####2 | 42%  2025-05-07T19:45:18.6139183Z 2025-05-07T19:45:18.6139197Z 2025-05-07T19:45:18.6584995Z python-3.9.22 | 22.5 MB | ####7 | 47%  2025-05-07T19:45:18.6591999Z openjdk-23.0.1 | 181.3 MB | 9 | 9% 2025-05-07T19:45:18.6592294Z 2025-05-07T19:45:18.6766565Z bazel-7.5.0 | 47.4 MB | #####5 | 56%  2025-05-07T19:45:18.6766856Z 2025-05-07T19:45:18.6767181Z 2025-05-07T19:45:18.6767185Z 2025-05-07T19:45:18.6767466Z 2025-05-07T19:45:18.6767591Z 2025-05-07T19:45:18.6767597Z 2025-05-07T19:45:18.6767907Z libopenblas-0.3.29 | 5.6 MB | ########## | 100%  2025-05-07T19:45:18.6768244Z 2025-05-07T19:45:18.6768259Z 2025-05-07T19:45:18.6768263Z 2025-05-07T19:45:18.6768267Z 2025-05-07T19:45:18.6768270Z 2025-05-07T19:45:18.6768274Z 2025-05-07T19:45:18.7137177Z libopenblas-0.3.29 | 5.6 MB | ########## | 100%  2025-05-07T19:45:18.7137527Z 2025-05-07T19:45:18.7137533Z 2025-05-07T19:45:18.7179117Z python-3.9.22 | 22.5 MB | #######4 | 74%  2025-05-07T19:45:18.7179385Z 2025-05-07T19:45:18.7179403Z 2025-05-07T19:45:18.7179426Z 2025-05-07T19:45:18.7179430Z 2025-05-07T19:45:18.7179448Z 2025-05-07T19:45:18.7179452Z 2025-05-07T19:45:18.7179456Z 2025-05-07T19:45:18.7588796Z libcups-2.3.3 | 4.3 MB | | 0%  2025-05-07T19:45:18.7856275Z openjdk-23.0.1 | 181.3 MB | #1 | 11% 2025-05-07T19:45:18.7856769Z 2025-05-07T19:45:18.8139308Z bazel-7.5.0 | 47.4 MB | ######7 | 67%  2025-05-07T19:45:18.8139593Z 2025-05-07T19:45:18.8139839Z 2025-05-07T19:45:18.8502575Z python-3.9.22 | 22.5 MB | #########4 | 94%  2025-05-07T19:45:18.8502893Z 2025-05-07T19:45:18.8502899Z 2025-05-07T19:45:18.8502903Z 2025-05-07T19:45:18.8592106Z cmake-4.0.2 | 19.4 MB | ########## | 100%  2025-05-07T19:45:18.8868987Z openjdk-23.0.1 | 181.3 MB | #4 | 14% 2025-05-07T19:45:18.8869300Z 2025-05-07T19:45:18.8939694Z bazel-7.5.0 | 47.4 MB | #######8 | 79%  2025-05-07T19:45:18.8940024Z 2025-05-07T19:45:18.8940029Z 2025-05-07T19:45:18.8940072Z 2025-05-07T19:45:18.8940077Z 2025-05-07T19:45:18.8940081Z 2025-05-07T19:45:18.8940086Z 2025-05-07T19:45:18.8940091Z 2025-05-07T19:45:18.8940424Z libcups-2.3.3 | 4.3 MB | ########## | 100%  2025-05-07T19:45:18.8940726Z 2025-05-07T19:45:18.8940731Z 2025-05-07T19:45:18.8940735Z 2025-05-07T19:45:18.8940740Z 2025-05-07T19:45:18.8940746Z 2025-05-07T19:45:18.8940750Z 2025-05-07T19:45:18.8940754Z 2025-05-07T19:45:18.9038273Z libcups-2.3.3 | 4.3 MB | ########## | 100%  2025-05-07T19:45:18.9038600Z 2025-05-07T19:45:18.9038679Z 2025-05-07T19:45:18.9038683Z 2025-05-07T19:45:18.9038686Z 2025-05-07T19:45:18.9038690Z 2025-05-07T19:45:18.9038709Z 2025-05-07T19:45:18.9038712Z 2025-05-07T19:45:18.9038805Z 2025-05-07T19:45:18.9371450Z libglib-2.84.0 | 3.8 MB | | 0%  2025-05-07T19:45:18.9371809Z 2025-05-07T19:45:18.9371816Z 2025-05-07T19:45:18.9371822Z 2025-05-07T19:45:18.9371865Z 2025-05-07T19:45:18.9371870Z 2025-05-07T19:45:18.9371876Z 2025-05-07T19:45:18.9371881Z 2025-05-07T19:45:18.9371885Z 2025-05-07T19:45:18.9371889Z 2025-05-07T19:45:18.9593609Z libprotobuf-5.29.3 | 3.2 MB | | 0%  2025-05-07T19:45:18.9935023Z openjdk-23.0.1 | 181.3 MB | #7 | 17% 2025-05-07T19:45:18.9935302Z 2025-05-07T19:45:19.0522409Z bazel-7.5.0 | 47.4 MB | ########9 | 90%  2025-05-07T19:45:19.0523251Z 2025-05-07T19:45:19.0523266Z 2025-05-07T19:45:19.0523278Z 2025-05-07T19:45:19.0523289Z 2025-05-07T19:45:19.0523300Z 2025-05-07T19:45:19.0523311Z 2025-05-07T19:45:19.0523321Z 2025-05-07T19:45:19.0523357Z 2025-05-07T19:45:19.0524100Z libglib-2.84.0 | 3.8 MB | ########## | 100%  2025-05-07T19:45:19.0524942Z 2025-05-07T19:45:19.0524952Z 2025-05-07T19:45:19.0524963Z 2025-05-07T19:45:19.0524973Z 2025-05-07T19:45:19.0524983Z 2025-05-07T19:45:19.0524994Z 2025-05-07T19:45:19.0525005Z 2025-05-07T19:45:19.0527991Z 2025-05-07T19:45:19.0593834Z libglib-2.84.0 | 3.8 MB | ########## | 100%  2025-05-07T19:45:19.0650237Z openjdk-23.0.1 | 181.3 MB | ## | 21% 2025-05-07T19:45:19.0651088Z 2025-05-07T19:45:19.0651103Z 2025-05-07T19:45:19.0651115Z 2025-05-07T19:45:19.0651126Z 2025-05-07T19:45:19.0651136Z 2025-05-07T19:45:19.0651146Z 2025-05-07T19:45:19.0651157Z 2025-05-07T19:45:19.0651167Z 2025-05-07T19:45:19.0651211Z 2025-05-07T19:45:19.0652161Z libprotobuf-5.29.3 | 3.2 MB | ########## | 100%  2025-05-07T19:45:19.0653090Z 2025-05-07T19:45:19.0653102Z 2025-05-07T19:45:19.0653113Z 2025-05-07T19:45:19.0653123Z 2025-05-07T19:45:19.0653133Z 2025-05-07T19:45:19.0653144Z 2025-05-07T19:45:19.0653154Z 2025-05-07T19:45:19.0653165Z 2025-05-07T19:45:19.0653175Z 2025-05-07T19:45:19.0654864Z libprotobuf-5.29.3 | 3.2 MB | ########## | 100%  2025-05-07T19:45:19.0655684Z 2025-05-07T19:45:19.0655707Z 2025-05-07T19:45:19.0655711Z 2025-05-07T19:45:19.0655724Z 2025-05-07T19:45:19.1036114Z libgrpc-1.71.0 | 7.6 MB | ########## | 100%  2025-05-07T19:45:19.1036424Z 2025-05-07T19:45:19.1036536Z 2025-05-07T19:45:19.1036540Z 2025-05-07T19:45:19.1036544Z 2025-05-07T19:45:19.1036636Z 2025-05-07T19:45:19.1036644Z 2025-05-07T19:45:19.1036650Z 2025-05-07T19:45:19.1036655Z 2025-05-07T19:45:19.1036670Z 2025-05-07T19:45:19.1036894Z 2025-05-07T19:45:19.1036901Z 2025-05-07T19:45:19.1381193Z font-ttf-ubuntu-0.83 | 1.5 MB | 1 | 1%  2025-05-07T19:45:19.1381553Z 2025-05-07T19:45:19.1381559Z 2025-05-07T19:45:19.1381564Z 2025-05-07T19:45:19.1381568Z 2025-05-07T19:45:19.1381572Z 2025-05-07T19:45:19.1381575Z 2025-05-07T19:45:19.1381579Z 2025-05-07T19:45:19.1381595Z 2025-05-07T19:45:19.1381599Z 2025-05-07T19:45:19.1381602Z 2025-05-07T19:45:19.1580822Z tk-8.6.13 | 3.2 MB | | 0%  2025-05-07T19:45:19.1581135Z 2025-05-07T19:45:19.1581139Z 2025-05-07T19:45:19.1581143Z 2025-05-07T19:45:19.1581146Z 2025-05-07T19:45:19.1581150Z 2025-05-07T19:45:19.1581167Z 2025-05-07T19:45:19.1581171Z 2025-05-07T19:45:19.1581174Z 2025-05-07T19:45:19.1581177Z 2025-05-07T19:45:19.1581181Z 2025-05-07T19:45:19.1581581Z 2025-05-07T19:45:19.1605051Z font-ttf-ubuntu-0.83 | 1.5 MB | ########## | 100%  2025-05-07T19:45:19.1932445Z openjdk-23.0.1 | 181.3 MB | ##4 | 24% 2025-05-07T19:45:19.1932913Z 2025-05-07T19:45:19.1933017Z 2025-05-07T19:45:19.1933023Z 2025-05-07T19:45:19.1933027Z 2025-05-07T19:45:19.1933058Z 2025-05-07T19:45:19.1933062Z 2025-05-07T19:45:19.1996851Z libopenblas-0.3.29 | 5.6 MB | ########## | 100%  2025-05-07T19:45:19.1997206Z 2025-05-07T19:45:19.1997211Z 2025-05-07T19:45:19.1997214Z 2025-05-07T19:45:19.1997217Z 2025-05-07T19:45:19.1997221Z 2025-05-07T19:45:19.1997224Z 2025-05-07T19:45:19.1997229Z 2025-05-07T19:45:19.1997251Z 2025-05-07T19:45:19.1997254Z 2025-05-07T19:45:19.1997258Z 2025-05-07T19:45:19.1997262Z 2025-05-07T19:45:19.1997265Z 2025-05-07T19:45:19.2379622Z harfbuzz-9.0.0 | 1.5 MB | 1 | 1%  2025-05-07T19:45:19.2379967Z 2025-05-07T19:45:19.2379972Z 2025-05-07T19:45:19.2379978Z 2025-05-07T19:45:19.2379981Z 2025-05-07T19:45:19.2379985Z 2025-05-07T19:45:19.2379988Z 2025-05-07T19:45:19.2380009Z 2025-05-07T19:45:19.2380014Z 2025-05-07T19:45:19.2380019Z 2025-05-07T19:45:19.2380024Z 2025-05-07T19:45:19.2475925Z tk-8.6.13 | 3.2 MB | ########## | 100%  2025-05-07T19:45:19.2476217Z 2025-05-07T19:45:19.2476222Z 2025-05-07T19:45:19.2476226Z 2025-05-07T19:45:19.2476229Z 2025-05-07T19:45:19.2476233Z 2025-05-07T19:45:19.2476236Z 2025-05-07T19:45:19.2476242Z 2025-05-07T19:45:19.2476245Z 2025-05-07T19:45:19.2476252Z 2025-05-07T19:45:19.2476255Z 2025-05-07T19:45:19.2476260Z 2025-05-07T19:45:19.2476710Z 2025-05-07T19:45:19.2605496Z harfbuzz-9.0.0 | 1.5 MB | ########## | 100%  2025-05-07T19:45:19.2607161Z openjdk-23.0.1 | 181.3 MB | ##8 | 28% 2025-05-07T19:45:19.2607434Z 2025-05-07T19:45:19.2607446Z 2025-05-07T19:45:19.2796245Z python-3.9.22 | 22.5 MB | ########## | 100%  2025-05-07T19:45:19.2796541Z 2025-05-07T19:45:19.2796859Z 2025-05-07T19:45:19.2796867Z 2025-05-07T19:45:19.2796892Z 2025-05-07T19:45:19.2796897Z 2025-05-07T19:45:19.2796901Z 2025-05-07T19:45:19.2796907Z 2025-05-07T19:45:19.2796912Z 2025-05-07T19:45:19.2796917Z 2025-05-07T19:45:19.2796923Z 2025-05-07T19:45:19.2796928Z 2025-05-07T19:45:19.2796962Z 2025-05-07T19:45:19.2796966Z 2025-05-07T19:45:19.2940772Z libgfortran5-15.1.0 | 1.5 MB | 1 | 1%  2025-05-07T19:45:19.2941822Z 2025-05-07T19:45:19.2941836Z 2025-05-07T19:45:19.2941847Z 2025-05-07T19:45:19.2941857Z 2025-05-07T19:45:19.2941903Z 2025-05-07T19:45:19.2941915Z 2025-05-07T19:45:19.2941947Z 2025-05-07T19:45:19.2941957Z 2025-05-07T19:45:19.2941968Z 2025-05-07T19:45:19.2941978Z 2025-05-07T19:45:19.2941989Z 2025-05-07T19:45:19.2941999Z 2025-05-07T19:45:19.2942011Z 2025-05-07T19:45:19.2942021Z 2025-05-07T19:45:19.2970227Z krb5-1.21.3 | 1.3 MB | 1 | 1%  2025-05-07T19:45:19.2971187Z 2025-05-07T19:45:19.2971200Z 2025-05-07T19:45:19.2971619Z 2025-05-07T19:45:19.2971630Z 2025-05-07T19:45:19.2971641Z 2025-05-07T19:45:19.2971651Z 2025-05-07T19:45:19.2971662Z 2025-05-07T19:45:19.2971672Z 2025-05-07T19:45:19.2971683Z 2025-05-07T19:45:19.2971693Z 2025-05-07T19:45:19.2971703Z 2025-05-07T19:45:19.2971714Z 2025-05-07T19:45:19.2971724Z 2025-05-07T19:45:19.2971734Z 2025-05-07T19:45:19.2971744Z 2025-05-07T19:45:19.3324106Z libabseil-20250127.1 | 1.3 MB | 1 | 1%  2025-05-07T19:45:19.3325205Z 2025-05-07T19:45:19.3325251Z 2025-05-07T19:45:19.3325263Z 2025-05-07T19:45:19.3325274Z 2025-05-07T19:45:19.3325285Z 2025-05-07T19:45:19.3325295Z 2025-05-07T19:45:19.3325306Z 2025-05-07T19:45:19.3325317Z 2025-05-07T19:45:19.3325328Z 2025-05-07T19:45:19.3325338Z 2025-05-07T19:45:19.3325348Z 2025-05-07T19:45:19.3325358Z 2025-05-07T19:45:19.3325368Z 2025-05-07T19:45:19.3429960Z libgfortran5-15.1.0 | 1.5 MB | ########## | 100%  2025-05-07T19:45:19.3431035Z 2025-05-07T19:45:19.3431050Z 2025-05-07T19:45:19.3431061Z 2025-05-07T19:45:19.3431071Z 2025-05-07T19:45:19.3431082Z 2025-05-07T19:45:19.3431093Z 2025-05-07T19:45:19.3431103Z 2025-05-07T19:45:19.3431114Z 2025-05-07T19:45:19.3431124Z 2025-05-07T19:45:19.3431155Z 2025-05-07T19:45:19.3431166Z 2025-05-07T19:45:19.3431176Z 2025-05-07T19:45:19.3431186Z 2025-05-07T19:45:19.3431197Z 2025-05-07T19:45:19.3431972Z 2025-05-07T19:45:19.3475147Z libabseil-20250127.1 | 1.3 MB | ########## | 100%  2025-05-07T19:45:19.3475521Z 2025-05-07T19:45:19.3475526Z 2025-05-07T19:45:19.3475542Z 2025-05-07T19:45:19.3475546Z 2025-05-07T19:45:19.3475549Z 2025-05-07T19:45:19.3475553Z 2025-05-07T19:45:19.3475556Z 2025-05-07T19:45:19.3475570Z 2025-05-07T19:45:19.3475573Z 2025-05-07T19:45:19.3475577Z 2025-05-07T19:45:19.3475580Z 2025-05-07T19:45:19.3475584Z 2025-05-07T19:45:19.3475587Z 2025-05-07T19:45:19.3475591Z 2025-05-07T19:45:19.3611660Z krb5-1.21.3 | 1.3 MB | ########## | 100%  2025-05-07T19:45:19.3625271Z openjdk-23.0.1 | 181.3 MB | ###2 | 32% 2025-05-07T19:45:19.3625555Z 2025-05-07T19:45:19.3625560Z 2025-05-07T19:45:19.3625575Z 2025-05-07T19:45:19.3625579Z 2025-05-07T19:45:19.3626142Z 2025-05-07T19:45:19.3747827Z openblas-0.3.29 | 5.8 MB | ########## | 100%  2025-05-07T19:45:19.3748149Z 2025-05-07T19:45:19.3748154Z 2025-05-07T19:45:19.3748158Z 2025-05-07T19:45:19.3748162Z 2025-05-07T19:45:19.3748365Z 2025-05-07T19:45:19.3748370Z 2025-05-07T19:45:19.3748389Z 2025-05-07T19:45:19.3748393Z 2025-05-07T19:45:19.3748396Z 2025-05-07T19:45:19.3748400Z 2025-05-07T19:45:19.3748403Z 2025-05-07T19:45:19.3748407Z 2025-05-07T19:45:19.3748411Z 2025-05-07T19:45:19.3748414Z 2025-05-07T19:45:19.3748418Z 2025-05-07T19:45:19.3748421Z 2025-05-07T19:45:19.3752765Z 2025-05-07T19:45:19.3757090Z pcre2-10.44 | 934 KB | 1 | 2%  2025-05-07T19:45:19.3758142Z 2025-05-07T19:45:19.3758156Z 2025-05-07T19:45:19.3758167Z 2025-05-07T19:45:19.3758177Z 2025-05-07T19:45:19.3758187Z 2025-05-07T19:45:19.3758197Z 2025-05-07T19:45:19.3758207Z 2025-05-07T19:45:19.3758218Z 2025-05-07T19:45:19.3758228Z 2025-05-07T19:45:19.3758238Z 2025-05-07T19:45:19.3758248Z 2025-05-07T19:45:19.3758258Z 2025-05-07T19:45:19.3758269Z 2025-05-07T19:45:19.3758279Z 2025-05-07T19:45:19.3758289Z 2025-05-07T19:45:19.3758299Z 2025-05-07T19:45:19.3875719Z cairo-1.18.0 | 961 KB | 1 | 2%  2025-05-07T19:45:19.3876059Z 2025-05-07T19:45:19.3876064Z 2025-05-07T19:45:19.3876068Z 2025-05-07T19:45:19.3876071Z 2025-05-07T19:45:19.3876075Z 2025-05-07T19:45:19.3876079Z 2025-05-07T19:45:19.3876082Z 2025-05-07T19:45:19.3876086Z 2025-05-07T19:45:19.3876089Z 2025-05-07T19:45:19.3876093Z 2025-05-07T19:45:19.3876097Z 2025-05-07T19:45:19.3876287Z 2025-05-07T19:45:19.3876305Z 2025-05-07T19:45:19.3876309Z 2025-05-07T19:45:19.3876312Z 2025-05-07T19:45:19.3876317Z 2025-05-07T19:45:19.3876320Z 2025-05-07T19:45:19.3876324Z 2025-05-07T19:45:19.4115940Z libsqlite-3.49.2 | 895 KB | 1 | 2%  2025-05-07T19:45:19.4116308Z 2025-05-07T19:45:19.4116405Z 2025-05-07T19:45:19.4116410Z 2025-05-07T19:45:19.4116414Z 2025-05-07T19:45:19.4116418Z 2025-05-07T19:45:19.4116431Z 2025-05-07T19:45:19.4116434Z 2025-05-07T19:45:19.4116472Z 2025-05-07T19:45:19.4116495Z 2025-05-07T19:45:19.4116591Z 2025-05-07T19:45:19.4116600Z 2025-05-07T19:45:19.4116606Z 2025-05-07T19:45:19.4116610Z 2025-05-07T19:45:19.4116615Z 2025-05-07T19:45:19.4116619Z 2025-05-07T19:45:19.4116624Z 2025-05-07T19:45:19.4148063Z cairo-1.18.0 | 961 KB | ########## | 100%  2025-05-07T19:45:19.4148454Z 2025-05-07T19:45:19.4148459Z 2025-05-07T19:45:19.4148463Z 2025-05-07T19:45:19.4148502Z 2025-05-07T19:45:19.4148506Z 2025-05-07T19:45:19.4148510Z 2025-05-07T19:45:19.4148514Z 2025-05-07T19:45:19.4148519Z 2025-05-07T19:45:19.4148524Z 2025-05-07T19:45:19.4148529Z 2025-05-07T19:45:19.4148534Z 2025-05-07T19:45:19.4148539Z 2025-05-07T19:45:19.4148575Z 2025-05-07T19:45:19.4148580Z 2025-05-07T19:45:19.4148584Z 2025-05-07T19:45:19.4148589Z 2025-05-07T19:45:19.4148592Z 2025-05-07T19:45:19.4176867Z pcre2-10.44 | 934 KB | ########## | 100%  2025-05-07T19:45:19.4177370Z 2025-05-07T19:45:19.4177399Z 2025-05-07T19:45:19.4177403Z 2025-05-07T19:45:19.4177407Z 2025-05-07T19:45:19.4177411Z 2025-05-07T19:45:19.4177414Z 2025-05-07T19:45:19.4177418Z 2025-05-07T19:45:19.4177421Z 2025-05-07T19:45:19.4177424Z 2025-05-07T19:45:19.4177428Z 2025-05-07T19:45:19.4177431Z 2025-05-07T19:45:19.4177435Z 2025-05-07T19:45:19.4177438Z 2025-05-07T19:45:19.4177442Z 2025-05-07T19:45:19.4177445Z 2025-05-07T19:45:19.4177457Z 2025-05-07T19:45:19.4177460Z 2025-05-07T19:45:19.4177464Z 2025-05-07T19:45:19.4312271Z libsqlite-3.49.2 | 895 KB | ########## | 100%  2025-05-07T19:45:19.4312644Z 2025-05-07T19:45:19.4312648Z 2025-05-07T19:45:19.4312653Z 2025-05-07T19:45:19.4312657Z 2025-05-07T19:45:19.4312660Z 2025-05-07T19:45:19.4312664Z 2025-05-07T19:45:19.4312667Z 2025-05-07T19:45:19.4580131Z libcups-2.3.3 | 4.3 MB | ########## | 100%  2025-05-07T19:45:19.4580503Z 2025-05-07T19:45:19.4580777Z 2025-05-07T19:45:19.4580784Z 2025-05-07T19:45:19.4580789Z 2025-05-07T19:45:19.4580794Z 2025-05-07T19:45:19.4580797Z 2025-05-07T19:45:19.4580800Z 2025-05-07T19:45:19.4580805Z 2025-05-07T19:45:19.4580808Z 2025-05-07T19:45:19.4580812Z 2025-05-07T19:45:19.4580816Z 2025-05-07T19:45:19.4580820Z 2025-05-07T19:45:19.4580823Z 2025-05-07T19:45:19.4580827Z 2025-05-07T19:45:19.4580831Z 2025-05-07T19:45:19.4580860Z 2025-05-07T19:45:19.4580880Z 2025-05-07T19:45:19.4580883Z 2025-05-07T19:45:19.4580887Z 2025-05-07T19:45:19.4613282Z ... (more hidden) ... 2025-05-07T19:45:19.4899407Z openjdk-23.0.1 | 181.3 MB | ###6 | 36% 2025-05-07T19:45:19.4899708Z 2025-05-07T19:45:19.4899713Z 2025-05-07T19:45:19.4899718Z 2025-05-07T19:45:19.4899722Z 2025-05-07T19:45:19.4899725Z 2025-05-07T19:45:19.4899730Z 2025-05-07T19:45:19.4899733Z 2025-05-07T19:45:19.4899737Z 2025-05-07T19:45:19.4899741Z 2025-05-07T19:45:19.4899746Z 2025-05-07T19:45:19.4899771Z 2025-05-07T19:45:19.4899776Z 2025-05-07T19:45:19.4899801Z 2025-05-07T19:45:19.4899805Z 2025-05-07T19:45:19.4899809Z 2025-05-07T19:45:19.4899812Z 2025-05-07T19:45:19.4899816Z 2025-05-07T19:45:19.4899819Z 2025-05-07T19:45:19.4899822Z 2025-05-07T19:45:19.7171504Z ... (more hidden) ... 2025-05-07T19:45:19.8182145Z openjdk-23.0.1 | 181.3 MB | ###9 | 40% 2025-05-07T19:45:19.8191190Z openjdk-23.0.1 | 181.3 MB | ####3 | 43% 2025-05-07T19:45:19.8191477Z 2025-05-07T19:45:19.8191483Z 2025-05-07T19:45:19.8191488Z 2025-05-07T19:45:19.8191503Z 2025-05-07T19:45:19.8191507Z 2025-05-07T19:45:19.8191511Z 2025-05-07T19:45:19.8191515Z 2025-05-07T19:45:19.8191700Z 2025-05-07T19:45:19.8609365Z libglib-2.84.0 | 3.8 MB | ########## | 100%  2025-05-07T19:45:19.8609688Z 2025-05-07T19:45:19.9064420Z bazel-7.5.0 | 47.4 MB | ########## | 100%  2025-05-07T19:45:19.9064769Z 2025-05-07T19:45:19.9064774Z 2025-05-07T19:45:19.9064778Z 2025-05-07T19:45:19.9064782Z 2025-05-07T19:45:19.9064795Z 2025-05-07T19:45:19.9064799Z 2025-05-07T19:45:19.9064803Z 2025-05-07T19:45:19.9064806Z 2025-05-07T19:45:19.9064810Z 2025-05-07T19:45:19.9064814Z 2025-05-07T19:45:19.9064818Z 2025-05-07T19:45:19.9070037Z font-ttf-ubuntu-0.83 | 1.5 MB | ########## | 100%  2025-05-07T19:45:19.9070404Z 2025-05-07T19:45:19.9070426Z 2025-05-07T19:45:19.9070430Z 2025-05-07T19:45:19.9070433Z 2025-05-07T19:45:19.9070437Z 2025-05-07T19:45:19.9070447Z 2025-05-07T19:45:19.9070451Z 2025-05-07T19:45:19.9070454Z 2025-05-07T19:45:19.9070458Z 2025-05-07T19:45:19.9070461Z 2025-05-07T19:45:19.9070465Z 2025-05-07T19:45:19.9183023Z font-ttf-ubuntu-0.83 | 1.5 MB | ########## | 100%  2025-05-07T19:45:20.0167985Z openjdk-23.0.1 | 181.3 MB | ####7 | 47% 2025-05-07T19:45:20.0168307Z 2025-05-07T19:45:20.0168355Z 2025-05-07T19:45:20.0168384Z 2025-05-07T19:45:20.0168401Z 2025-05-07T19:45:20.0168405Z 2025-05-07T19:45:20.0168408Z 2025-05-07T19:45:20.0168412Z 2025-05-07T19:45:20.0168416Z 2025-05-07T19:45:20.0168421Z 2025-05-07T19:45:20.0214033Z libprotobuf-5.29.3 | 3.2 MB | ########## | 100%  2025-05-07T19:45:20.1218265Z openjdk-23.0.1 | 181.3 MB | ##### | 51% 2025-05-07T19:45:20.2203247Z openjdk-23.0.1 | 181.3 MB | #####4 | 55% 2025-05-07T19:45:20.2203577Z 2025-05-07T19:45:20.2203582Z 2025-05-07T19:45:20.2203586Z 2025-05-07T19:45:20.2203589Z 2025-05-07T19:45:20.2203593Z 2025-05-07T19:45:20.2203596Z 2025-05-07T19:45:20.2203599Z 2025-05-07T19:45:20.2203603Z 2025-05-07T19:45:20.2203607Z 2025-05-07T19:45:20.2203614Z 2025-05-07T19:45:20.2203617Z 2025-05-07T19:45:20.2203886Z 2025-05-07T19:45:20.2210435Z harfbuzz-9.0.0 | 1.5 MB | ########## | 100%  2025-05-07T19:45:20.2210784Z 2025-05-07T19:45:20.2210793Z 2025-05-07T19:45:20.2211608Z 2025-05-07T19:45:20.2211614Z 2025-05-07T19:45:20.2211620Z 2025-05-07T19:45:20.2211626Z 2025-05-07T19:45:20.2211631Z 2025-05-07T19:45:20.2211635Z 2025-05-07T19:45:20.2211640Z 2025-05-07T19:45:20.2211643Z 2025-05-07T19:45:20.2211647Z 2025-05-07T19:45:20.2211650Z 2025-05-07T19:45:20.2216237Z harfbuzz-9.0.0 | 1.5 MB | ########## | 100%  2025-05-07T19:45:20.3264635Z openjdk-23.0.1 | 181.3 MB | #####8 | 59% 2025-05-07T19:45:20.5325281Z openjdk-23.0.1 | 181.3 MB | ######2 | 62% 2025-05-07T19:45:20.5769536Z openjdk-23.0.1 | 181.3 MB | ######5 | 66% 2025-05-07T19:45:20.5769843Z 2025-05-07T19:45:20.5769849Z 2025-05-07T19:45:20.5769853Z 2025-05-07T19:45:20.5769856Z 2025-05-07T19:45:20.5769860Z 2025-05-07T19:45:20.5769864Z 2025-05-07T19:45:20.5769869Z 2025-05-07T19:45:20.5769873Z 2025-05-07T19:45:20.5769878Z 2025-05-07T19:45:20.5769881Z 2025-05-07T19:45:20.5771366Z tk-8.6.13 | 3.2 MB | ########## | 100%  2025-05-07T19:45:20.5771660Z 2025-05-07T19:45:20.5771674Z 2025-05-07T19:45:20.5771678Z 2025-05-07T19:45:20.5771682Z 2025-05-07T19:45:20.5771685Z 2025-05-07T19:45:20.5771689Z 2025-05-07T19:45:20.5771692Z 2025-05-07T19:45:20.5771696Z 2025-05-07T19:45:20.5771699Z 2025-05-07T19:45:20.5771703Z 2025-05-07T19:45:20.6437590Z tk-8.6.13 | 3.2 MB | ########## | 100%  2025-05-07T19:45:20.6570093Z openjdk-23.0.1 | 181.3 MB | ######8 | 69% 2025-05-07T19:45:20.6570400Z 2025-05-07T19:45:20.6570406Z 2025-05-07T19:45:20.6570409Z 2025-05-07T19:45:20.6570413Z 2025-05-07T19:45:20.6570416Z 2025-05-07T19:45:20.6570420Z 2025-05-07T19:45:20.6570423Z 2025-05-07T19:45:20.6570426Z 2025-05-07T19:45:20.6570430Z 2025-05-07T19:45:20.6570433Z 2025-05-07T19:45:20.6570437Z 2025-05-07T19:45:20.6570440Z 2025-05-07T19:45:20.6570444Z 2025-05-07T19:45:20.6572498Z libgfortran5-15.1.0 | 1.5 MB | ########## | 100%  2025-05-07T19:45:20.6572846Z 2025-05-07T19:45:20.6572850Z 2025-05-07T19:45:20.6572865Z 2025-05-07T19:45:20.6572869Z 2025-05-07T19:45:20.6572872Z 2025-05-07T19:45:20.6572876Z 2025-05-07T19:45:20.6572879Z 2025-05-07T19:45:20.6572883Z 2025-05-07T19:45:20.6572886Z 2025-05-07T19:45:20.6572889Z 2025-05-07T19:45:20.6572893Z 2025-05-07T19:45:20.6572896Z 2025-05-07T19:45:20.6572900Z 2025-05-07T19:45:20.7574947Z libgfortran5-15.1.0 | 1.5 MB | ########## | 100%  2025-05-07T19:45:20.8863096Z openjdk-23.0.1 | 181.3 MB | #######1 | 72% 2025-05-07T19:45:21.0169034Z openjdk-23.0.1 | 181.3 MB | #######4 | 75% 2025-05-07T19:45:21.1178671Z openjdk-23.0.1 | 181.3 MB | #######7 | 77% 2025-05-07T19:45:21.2324478Z openjdk-23.0.1 | 181.3 MB | ########1 | 82% 2025-05-07T19:45:21.2324778Z 2025-05-07T19:45:21.2324937Z 2025-05-07T19:45:21.2324946Z 2025-05-07T19:45:21.2324951Z 2025-05-07T19:45:21.2324956Z 2025-05-07T19:45:21.2324998Z 2025-05-07T19:45:21.2325004Z 2025-05-07T19:45:21.2325008Z 2025-05-07T19:45:21.2325013Z 2025-05-07T19:45:21.2325019Z 2025-05-07T19:45:21.2325025Z 2025-05-07T19:45:21.2325030Z 2025-05-07T19:45:21.2325034Z 2025-05-07T19:45:21.2325038Z 2025-05-07T19:45:21.2325043Z 2025-05-07T19:45:21.2325658Z libabseil-20250127.1 | 1.3 MB | ########## | 100%  2025-05-07T19:45:21.2326052Z 2025-05-07T19:45:21.2326070Z 2025-05-07T19:45:21.2326073Z 2025-05-07T19:45:21.2326077Z 2025-05-07T19:45:21.2326080Z 2025-05-07T19:45:21.2326084Z 2025-05-07T19:45:21.2326087Z 2025-05-07T19:45:21.2326090Z 2025-05-07T19:45:21.2326094Z 2025-05-07T19:45:21.2326097Z 2025-05-07T19:45:21.2326100Z 2025-05-07T19:45:21.2326104Z 2025-05-07T19:45:21.2326108Z 2025-05-07T19:45:21.2326111Z 2025-05-07T19:45:21.2326115Z 2025-05-07T19:45:21.2385009Z libabseil-20250127.1 | 1.3 MB | ########## | 100%  2025-05-07T19:45:21.3386024Z openjdk-23.0.1 | 181.3 MB | ########5 | 85% 2025-05-07T19:45:21.4003398Z openjdk-23.0.1 | 181.3 MB | ########8 | 88% 2025-05-07T19:45:21.4003868Z 2025-05-07T19:45:21.4003872Z 2025-05-07T19:45:21.4003877Z 2025-05-07T19:45:21.4003880Z 2025-05-07T19:45:21.4003884Z 2025-05-07T19:45:21.4003887Z 2025-05-07T19:45:21.4003892Z 2025-05-07T19:45:21.4003895Z 2025-05-07T19:45:21.4003899Z 2025-05-07T19:45:21.4003902Z 2025-05-07T19:45:21.4003930Z 2025-05-07T19:45:21.4003933Z 2025-05-07T19:45:21.4003937Z 2025-05-07T19:45:21.4003940Z 2025-05-07T19:45:21.4004328Z krb5-1.21.3 | 1.3 MB | ########## | 100%  2025-05-07T19:45:21.4004623Z 2025-05-07T19:45:21.4004627Z 2025-05-07T19:45:21.4004630Z 2025-05-07T19:45:21.4004634Z 2025-05-07T19:45:21.4004637Z 2025-05-07T19:45:21.4004642Z 2025-05-07T19:45:21.4004645Z 2025-05-07T19:45:21.4004649Z 2025-05-07T19:45:21.4004652Z 2025-05-07T19:45:21.4004656Z 2025-05-07T19:45:21.4004659Z 2025-05-07T19:45:21.4004677Z 2025-05-07T19:45:21.4004681Z 2025-05-07T19:45:21.4004684Z 2025-05-07T19:45:21.4914313Z krb5-1.21.3 | 1.3 MB | ########## | 100%  2025-05-07T19:45:21.4914689Z 2025-05-07T19:45:21.4914695Z 2025-05-07T19:45:21.4914699Z 2025-05-07T19:45:21.4914703Z 2025-05-07T19:45:21.4914708Z 2025-05-07T19:45:21.4914713Z 2025-05-07T19:45:21.4914718Z 2025-05-07T19:45:21.4914722Z 2025-05-07T19:45:21.4915013Z 2025-05-07T19:45:21.4915019Z 2025-05-07T19:45:21.4915053Z 2025-05-07T19:45:21.4915056Z 2025-05-07T19:45:21.4915060Z 2025-05-07T19:45:21.4915064Z 2025-05-07T19:45:21.4915068Z 2025-05-07T19:45:21.4915071Z 2025-05-07T19:45:21.4915441Z cairo-1.18.0 | 961 KB | ########## | 100%  2025-05-07T19:45:21.4915764Z 2025-05-07T19:45:21.4915768Z 2025-05-07T19:45:21.4915773Z 2025-05-07T19:45:21.4915776Z 2025-05-07T19:45:21.4915792Z 2025-05-07T19:45:21.4915796Z 2025-05-07T19:45:21.4915821Z 2025-05-07T19:45:21.4915825Z 2025-05-07T19:45:21.4915828Z 2025-05-07T19:45:21.4915832Z 2025-05-07T19:45:21.4915835Z 2025-05-07T19:45:21.4915838Z 2025-05-07T19:45:21.4915842Z 2025-05-07T19:45:21.4915845Z 2025-05-07T19:45:21.4915849Z 2025-05-07T19:45:21.4915852Z 2025-05-07T19:45:21.5651891Z cairo-1.18.0 | 961 KB | ########## | 100%  2025-05-07T19:45:21.6653150Z openjdk-23.0.1 | 181.3 MB | #########1 | 91% 2025-05-07T19:45:21.7372309Z openjdk-23.0.1 | 181.3 MB | #########4 | 94% 2025-05-07T19:45:21.7372786Z 2025-05-07T19:45:21.7372797Z 2025-05-07T19:45:21.7372809Z 2025-05-07T19:45:21.7372814Z 2025-05-07T19:45:21.7372856Z 2025-05-07T19:45:21.7372861Z 2025-05-07T19:45:21.7372908Z 2025-05-07T19:45:21.7372918Z 2025-05-07T19:45:21.7372923Z 2025-05-07T19:45:21.7372928Z 2025-05-07T19:45:21.7372932Z 2025-05-07T19:45:21.7372936Z 2025-05-07T19:45:21.7372941Z 2025-05-07T19:45:21.7372945Z 2025-05-07T19:45:21.7372983Z 2025-05-07T19:45:21.7372986Z 2025-05-07T19:45:21.7372990Z 2025-05-07T19:45:21.7374109Z pcre2-10.44 | 934 KB | ########## | 100%  2025-05-07T19:45:21.7374470Z 2025-05-07T19:45:21.7374488Z 2025-05-07T19:45:21.7374492Z 2025-05-07T19:45:21.7374497Z 2025-05-07T19:45:21.7374500Z 2025-05-07T19:45:21.7374504Z 2025-05-07T19:45:21.7374510Z 2025-05-07T19:45:21.7374516Z 2025-05-07T19:45:21.7374555Z 2025-05-07T19:45:21.7374560Z 2025-05-07T19:45:21.7374565Z 2025-05-07T19:45:21.7374570Z 2025-05-07T19:45:21.7374575Z 2025-05-07T19:45:21.7374581Z 2025-05-07T19:45:21.7374586Z 2025-05-07T19:45:21.7374592Z 2025-05-07T19:45:21.7374597Z 2025-05-07T19:45:21.7748538Z pcre2-10.44 | 934 KB | ########## | 100%  2025-05-07T19:45:21.7748916Z 2025-05-07T19:45:21.7748922Z 2025-05-07T19:45:21.7748927Z 2025-05-07T19:45:21.7748930Z 2025-05-07T19:45:21.7748934Z 2025-05-07T19:45:21.7748937Z 2025-05-07T19:45:21.7749151Z 2025-05-07T19:45:21.7749157Z 2025-05-07T19:45:21.7749162Z 2025-05-07T19:45:21.7749168Z 2025-05-07T19:45:21.7749171Z 2025-05-07T19:45:21.7749175Z 2025-05-07T19:45:21.7749179Z 2025-05-07T19:45:21.7749182Z 2025-05-07T19:45:21.7749185Z 2025-05-07T19:45:21.7749194Z 2025-05-07T19:45:21.7749211Z 2025-05-07T19:45:21.7749214Z 2025-05-07T19:45:21.7752723Z libsqlite-3.49.2 | 895 KB | ########## | 100%  2025-05-07T19:45:21.7753090Z 2025-05-07T19:45:21.7753093Z 2025-05-07T19:45:21.7753107Z 2025-05-07T19:45:21.7753110Z 2025-05-07T19:45:21.7753114Z 2025-05-07T19:45:21.7753129Z 2025-05-07T19:45:21.7753133Z 2025-05-07T19:45:21.7753136Z 2025-05-07T19:45:21.7753139Z 2025-05-07T19:45:21.7753143Z 2025-05-07T19:45:21.7753146Z 2025-05-07T19:45:21.7753149Z 2025-05-07T19:45:21.7753153Z 2025-05-07T19:45:21.7753156Z 2025-05-07T19:45:21.7753160Z 2025-05-07T19:45:21.7753163Z 2025-05-07T19:45:21.7753166Z 2025-05-07T19:45:21.7753797Z 2025-05-07T19:45:21.8173220Z libsqlite-3.49.2 | 895 KB | ########## | 100%  2025-05-07T19:45:22.5096860Z openjdk-23.0.1 | 181.3 MB | #########6 | 97% 2025-05-07T19:45:22.5097353Z 2025-05-07T19:45:22.5097359Z 2025-05-07T19:45:22.9579035Z python-3.9.22 | 22.5 MB | ########## | 100%  2025-05-07T19:45:22.9579349Z 2025-05-07T19:45:22.9579354Z 2025-05-07T19:45:22.9579615Z 2025-05-07T19:45:23.7207388Z cmake-4.0.2 | 19.4 MB | ########## | 100%  2025-05-07T19:45:23.7207756Z 2025-05-07T19:45:23.7207761Z 2025-05-07T19:45:23.7207766Z 2025-05-07T19:45:23.7207771Z 2025-05-07T19:45:23.7207776Z 2025-05-07T19:45:23.7207782Z 2025-05-07T19:45:23.7207789Z 2025-05-07T19:45:23.7207795Z 2025-05-07T19:45:23.7207800Z 2025-05-07T19:45:23.7207805Z 2025-05-07T19:45:23.7207812Z 2025-05-07T19:45:23.7207839Z 2025-05-07T19:45:23.7207843Z 2025-05-07T19:45:23.7207846Z 2025-05-07T19:45:23.7207849Z 2025-05-07T19:45:23.7207888Z 2025-05-07T19:45:23.7207893Z 2025-05-07T19:45:23.7207897Z 2025-05-07T19:45:23.7207900Z 2025-05-07T19:45:23.7208207Z ... (more hidden) ... 2025-05-07T19:45:23.7208525Z 2025-05-07T19:45:23.7208530Z 2025-05-07T19:45:23.7208556Z 2025-05-07T19:45:23.7208559Z 2025-05-07T19:45:23.7208564Z 2025-05-07T19:45:23.7208567Z 2025-05-07T19:45:23.7208572Z 2025-05-07T19:45:23.7208575Z 2025-05-07T19:45:23.7208594Z 2025-05-07T19:45:23.7208598Z 2025-05-07T19:45:23.7208601Z 2025-05-07T19:45:23.7208605Z 2025-05-07T19:45:23.7208608Z 2025-05-07T19:45:23.7208612Z 2025-05-07T19:45:23.7208615Z 2025-05-07T19:45:23.7208619Z 2025-05-07T19:45:23.7208622Z 2025-05-07T19:45:23.7208625Z 2025-05-07T19:45:23.7208629Z 2025-05-07T19:45:24.2444319Z ... (more hidden) ... 2025-05-07T19:45:24.2444894Z openjdk-23.0.1 | 181.3 MB | ########## | 100% 2025-05-07T19:45:24.7887148Z openjdk-23.0.1 | 181.3 MB | ########## | 100% 2025-05-07T19:45:24.7887838Z 2025-05-07T19:45:25.4686566Z bazel-7.5.0 | 47.4 MB | ########## | 100%  2025-05-07T19:45:25.4691813Z openjdk-23.0.1 | 181.3 MB | ########## | 100% 2025-05-07T19:45:25.4692592Z 2025-05-07T19:45:25.4692608Z 2025-05-07T19:45:25.4692620Z 2025-05-07T19:45:25.4692664Z 2025-05-07T19:45:25.4692675Z 2025-05-07T19:45:25.4692687Z 2025-05-07T19:45:25.4692697Z 2025-05-07T19:45:25.4692762Z 2025-05-07T19:45:25.4692772Z 2025-05-07T19:45:25.4692784Z 2025-05-07T19:45:25.4692794Z 2025-05-07T19:45:25.4692805Z 2025-05-07T19:45:25.4692816Z 2025-05-07T19:45:25.4692828Z 2025-05-07T19:45:25.4692839Z 2025-05-07T19:45:25.4692851Z 2025-05-07T19:45:25.4692861Z 2025-05-07T19:45:25.4692873Z 2025-05-07T19:45:25.4692883Z 2025-05-07T19:45:25.4693131Z 2025-05-07T19:45:25.4694144Z  2025-05-07T19:45:25.4695629Z 2025-05-07T19:45:25.4696261Z 2025-05-07T19:45:25.4696772Z  2025-05-07T19:45:25.4697614Z 2025-05-07T19:45:25.4697661Z 2025-05-07T19:45:25.4698431Z  2025-05-07T19:45:25.4698708Z 2025-05-07T19:45:25.4698712Z 2025-05-07T19:45:25.4698716Z 2025-05-07T19:45:25.4698911Z  2025-05-07T19:45:25.4699159Z 2025-05-07T19:45:25.4699163Z 2025-05-07T19:45:25.4699166Z 2025-05-07T19:45:25.4699170Z 2025-05-07T19:45:25.4699389Z  2025-05-07T19:45:25.4699620Z 2025-05-07T19:45:25.4699624Z 2025-05-07T19:45:25.4699627Z 2025-05-07T19:45:25.4699631Z 2025-05-07T19:45:25.4699634Z 2025-05-07T19:45:25.4699844Z  2025-05-07T19:45:25.4700083Z 2025-05-07T19:45:25.4700087Z 2025-05-07T19:45:25.4700090Z 2025-05-07T19:45:25.4700101Z 2025-05-07T19:45:25.4700104Z 2025-05-07T19:45:25.4700108Z 2025-05-07T19:45:25.4700296Z  2025-05-07T19:45:25.4700546Z 2025-05-07T19:45:25.4700550Z 2025-05-07T19:45:25.4700553Z 2025-05-07T19:45:25.4700556Z 2025-05-07T19:45:25.4700560Z 2025-05-07T19:45:25.4700563Z 2025-05-07T19:45:25.4700567Z 2025-05-07T19:45:25.4717887Z  2025-05-07T19:45:25.4718715Z 2025-05-07T19:45:25.4718720Z 2025-05-07T19:45:25.4718724Z 2025-05-07T19:45:25.4718727Z 2025-05-07T19:45:25.4718732Z 2025-05-07T19:45:25.4718737Z 2025-05-07T19:45:25.4718741Z 2025-05-07T19:45:25.4718746Z 2025-05-07T19:45:25.4719099Z  2025-05-07T19:45:25.4719357Z 2025-05-07T19:45:25.4719363Z 2025-05-07T19:45:25.4719366Z 2025-05-07T19:45:25.4719370Z 2025-05-07T19:45:25.4719373Z 2025-05-07T19:45:25.4719377Z 2025-05-07T19:45:25.4719393Z 2025-05-07T19:45:25.4719397Z 2025-05-07T19:45:25.4719400Z 2025-05-07T19:45:25.4719648Z  2025-05-07T19:45:25.4719903Z 2025-05-07T19:45:25.4719907Z 2025-05-07T19:45:25.4719911Z 2025-05-07T19:45:25.4719914Z 2025-05-07T19:45:25.4719918Z 2025-05-07T19:45:25.4719921Z 2025-05-07T19:45:25.4719925Z 2025-05-07T19:45:25.4719928Z 2025-05-07T19:45:25.4719932Z 2025-05-07T19:45:25.4719939Z 2025-05-07T19:45:25.4720170Z  2025-05-07T19:45:25.4720430Z 2025-05-07T19:45:25.4720434Z 2025-05-07T19:45:25.4720438Z 2025-05-07T19:45:25.4720441Z 2025-05-07T19:45:25.4720444Z 2025-05-07T19:45:25.4720448Z 2025-05-07T19:45:25.4720451Z 2025-05-07T19:45:25.4720455Z 2025-05-07T19:45:25.4720458Z 2025-05-07T19:45:25.4720461Z 2025-05-07T19:45:25.4720465Z 2025-05-07T19:45:25.4720710Z  2025-05-07T19:45:25.4720966Z 2025-05-07T19:45:25.4720970Z 2025-05-07T19:45:25.4720973Z 2025-05-07T19:45:25.4720977Z 2025-05-07T19:45:25.4720980Z 2025-05-07T19:45:25.4720983Z 2025-05-07T19:45:25.4720987Z 2025-05-07T19:45:25.4720990Z 2025-05-07T19:45:25.4720994Z 2025-05-07T19:45:25.4720998Z 2025-05-07T19:45:25.4721030Z 2025-05-07T19:45:25.4721034Z 2025-05-07T19:45:25.4721249Z  2025-05-07T19:45:25.4721515Z 2025-05-07T19:45:25.4721519Z 2025-05-07T19:45:25.4721522Z 2025-05-07T19:45:25.4721525Z 2025-05-07T19:45:25.4721529Z 2025-05-07T19:45:25.4721532Z 2025-05-07T19:45:25.4721536Z 2025-05-07T19:45:25.4721574Z 2025-05-07T19:45:25.4721578Z 2025-05-07T19:45:25.4721582Z 2025-05-07T19:45:25.4721585Z 2025-05-07T19:45:25.4721588Z 2025-05-07T19:45:25.4721592Z 2025-05-07T19:45:25.4721803Z  2025-05-07T19:45:25.4722146Z 2025-05-07T19:45:25.4722150Z 2025-05-07T19:45:25.4722154Z 2025-05-07T19:45:25.4722157Z 2025-05-07T19:45:25.4722194Z 2025-05-07T19:45:25.4722198Z 2025-05-07T19:45:25.4722201Z 2025-05-07T19:45:25.4722204Z 2025-05-07T19:45:25.4722208Z 2025-05-07T19:45:25.4722211Z 2025-05-07T19:45:25.4722215Z 2025-05-07T19:45:25.4722218Z 2025-05-07T19:45:25.4722221Z 2025-05-07T19:45:25.4722225Z 2025-05-07T19:45:25.4722455Z  2025-05-07T19:45:25.4722750Z 2025-05-07T19:45:25.4722754Z 2025-05-07T19:45:25.4722757Z 2025-05-07T19:45:25.4722761Z 2025-05-07T19:45:25.4722764Z 2025-05-07T19:45:25.4722767Z 2025-05-07T19:45:25.4722771Z 2025-05-07T19:45:25.4722774Z 2025-05-07T19:45:25.4722778Z 2025-05-07T19:45:25.4722782Z 2025-05-07T19:45:25.4722786Z 2025-05-07T19:45:25.4722790Z 2025-05-07T19:45:25.4722793Z 2025-05-07T19:45:25.4722797Z 2025-05-07T19:45:25.4722800Z 2025-05-07T19:45:25.4723032Z  2025-05-07T19:45:25.4723333Z 2025-05-07T19:45:25.4723336Z 2025-05-07T19:45:25.4723340Z 2025-05-07T19:45:25.4723343Z 2025-05-07T19:45:25.4723347Z 2025-05-07T19:45:25.4723502Z 2025-05-07T19:45:25.4723506Z 2025-05-07T19:45:25.4723509Z 2025-05-07T19:45:25.4723513Z 2025-05-07T19:45:25.4723516Z 2025-05-07T19:45:25.4723520Z 2025-05-07T19:45:25.4723524Z 2025-05-07T19:45:25.4723624Z 2025-05-07T19:45:25.4723628Z 2025-05-07T19:45:25.4723632Z 2025-05-07T19:45:25.4723635Z 2025-05-07T19:45:25.4723902Z  2025-05-07T19:45:25.4724171Z 2025-05-07T19:45:25.4724175Z 2025-05-07T19:45:25.4724179Z 2025-05-07T19:45:25.4724182Z 2025-05-07T19:45:25.4724186Z 2025-05-07T19:45:25.4724190Z 2025-05-07T19:45:25.4724193Z 2025-05-07T19:45:25.4724197Z 2025-05-07T19:45:25.4724201Z 2025-05-07T19:45:25.4724205Z 2025-05-07T19:45:25.4724213Z 2025-05-07T19:45:25.4724251Z 2025-05-07T19:45:25.4724254Z 2025-05-07T19:45:25.4724258Z 2025-05-07T19:45:25.4724261Z 2025-05-07T19:45:25.4724265Z 2025-05-07T19:45:25.4724268Z 2025-05-07T19:45:25.4724518Z  2025-05-07T19:45:25.4724789Z 2025-05-07T19:45:25.4724792Z 2025-05-07T19:45:25.4724796Z 2025-05-07T19:45:25.4724824Z 2025-05-07T19:45:25.4724832Z 2025-05-07T19:45:25.4724836Z 2025-05-07T19:45:25.4724840Z 2025-05-07T19:45:25.4724843Z 2025-05-07T19:45:25.4724846Z 2025-05-07T19:45:25.4724850Z 2025-05-07T19:45:25.4724854Z 2025-05-07T19:45:25.4724857Z 2025-05-07T19:45:25.4724861Z 2025-05-07T19:45:25.4724865Z 2025-05-07T19:45:25.4724868Z 2025-05-07T19:45:25.4724872Z 2025-05-07T19:45:25.4724875Z 2025-05-07T19:45:25.4724878Z 2025-05-07T19:45:25.4725121Z  2025-05-07T19:45:25.4725446Z 2025-05-07T19:45:25.4725449Z 2025-05-07T19:45:25.4725559Z  2025-05-07T19:45:25.4725682Z 2025-05-07T19:45:25.4725686Z 2025-05-07T19:45:25.4725831Z  2025-05-07T19:45:25.4725962Z 2025-05-07T19:45:25.4725966Z 2025-05-07T19:45:25.4725969Z 2025-05-07T19:45:25.4726085Z  2025-05-07T19:45:25.4726235Z 2025-05-07T19:45:25.4726238Z 2025-05-07T19:45:25.4726242Z 2025-05-07T19:45:25.4726245Z 2025-05-07T19:45:25.4726362Z  2025-05-07T19:45:25.4726501Z 2025-05-07T19:45:25.4726506Z 2025-05-07T19:45:25.4726510Z 2025-05-07T19:45:25.4726514Z 2025-05-07T19:45:25.4726517Z 2025-05-07T19:45:25.4726665Z  2025-05-07T19:45:25.4726806Z 2025-05-07T19:45:25.4726809Z 2025-05-07T19:45:25.4726813Z 2025-05-07T19:45:25.4726816Z 2025-05-07T19:45:25.4726820Z 2025-05-07T19:45:25.4726823Z 2025-05-07T19:45:25.4726975Z  2025-05-07T19:45:25.4727121Z 2025-05-07T19:45:25.4727124Z 2025-05-07T19:45:25.4727128Z 2025-05-07T19:45:25.4727131Z 2025-05-07T19:45:25.4727192Z 2025-05-07T19:45:25.4727196Z 2025-05-07T19:45:25.4727200Z 2025-05-07T19:45:25.4727329Z  2025-05-07T19:45:25.4727521Z 2025-05-07T19:45:25.4727524Z 2025-05-07T19:45:25.4727528Z 2025-05-07T19:45:25.4727532Z 2025-05-07T19:45:25.4727536Z 2025-05-07T19:45:25.4727539Z 2025-05-07T19:45:25.4727542Z 2025-05-07T19:45:25.4727546Z 2025-05-07T19:45:25.4727675Z  2025-05-07T19:45:25.4727862Z 2025-05-07T19:45:25.4727869Z 2025-05-07T19:45:25.4727873Z 2025-05-07T19:45:25.4727876Z 2025-05-07T19:45:25.4727879Z 2025-05-07T19:45:25.4727883Z 2025-05-07T19:45:25.4727887Z 2025-05-07T19:45:25.4727890Z 2025-05-07T19:45:25.4727893Z 2025-05-07T19:45:25.4728028Z  2025-05-07T19:45:25.4728204Z 2025-05-07T19:45:25.4728233Z 2025-05-07T19:45:25.4728237Z 2025-05-07T19:45:25.4728240Z 2025-05-07T19:45:25.4728243Z 2025-05-07T19:45:25.4728247Z 2025-05-07T19:45:25.4728250Z 2025-05-07T19:45:25.4728254Z 2025-05-07T19:45:25.4728257Z 2025-05-07T19:45:25.4728264Z 2025-05-07T19:45:25.4728403Z  2025-05-07T19:45:25.4728586Z 2025-05-07T19:45:25.4728590Z 2025-05-07T19:45:25.4728618Z 2025-05-07T19:45:25.4728622Z 2025-05-07T19:45:25.4728625Z 2025-05-07T19:45:25.4728628Z 2025-05-07T19:45:25.4728632Z 2025-05-07T19:45:25.4728635Z 2025-05-07T19:45:25.4728639Z 2025-05-07T19:45:25.4728642Z 2025-05-07T19:45:25.4728646Z 2025-05-07T19:45:25.4728788Z  2025-05-07T19:45:25.4729050Z 2025-05-07T19:45:25.4729053Z 2025-05-07T19:45:25.4729083Z 2025-05-07T19:45:25.4729087Z 2025-05-07T19:45:25.4729090Z 2025-05-07T19:45:25.4729094Z 2025-05-07T19:45:25.4729097Z 2025-05-07T19:45:25.4729100Z 2025-05-07T19:45:25.4729104Z 2025-05-07T19:45:25.4729107Z 2025-05-07T19:45:25.4729111Z 2025-05-07T19:45:25.4729114Z 2025-05-07T19:45:25.4729262Z  2025-05-07T19:45:25.4729492Z 2025-05-07T19:45:25.4729496Z 2025-05-07T19:45:25.4729499Z 2025-05-07T19:45:25.4729503Z 2025-05-07T19:45:25.4729510Z 2025-05-07T19:45:25.4729514Z 2025-05-07T19:45:25.4729518Z 2025-05-07T19:45:25.4729521Z 2025-05-07T19:45:25.4729524Z 2025-05-07T19:45:25.4729528Z 2025-05-07T19:45:25.4729531Z 2025-05-07T19:45:25.4729534Z 2025-05-07T19:45:25.4729538Z 2025-05-07T19:45:25.4729683Z  2025-05-07T19:45:25.4729914Z 2025-05-07T19:45:25.4729919Z 2025-05-07T19:45:25.4729922Z 2025-05-07T19:45:25.4729927Z 2025-05-07T19:45:25.4729934Z 2025-05-07T19:45:25.4729937Z 2025-05-07T19:45:25.4729941Z 2025-05-07T19:45:25.4729944Z 2025-05-07T19:45:25.4729947Z 2025-05-07T19:45:25.4729951Z 2025-05-07T19:45:25.4729954Z 2025-05-07T19:45:25.4729958Z 2025-05-07T19:45:25.4729961Z 2025-05-07T19:45:25.4729964Z 2025-05-07T19:45:25.4730126Z  2025-05-07T19:45:25.4730366Z 2025-05-07T19:45:25.4730370Z 2025-05-07T19:45:25.4730374Z 2025-05-07T19:45:25.4730377Z 2025-05-07T19:45:25.4730381Z 2025-05-07T19:45:25.4730384Z 2025-05-07T19:45:25.4730391Z 2025-05-07T19:45:25.4730394Z 2025-05-07T19:45:25.4730398Z 2025-05-07T19:45:25.4730401Z 2025-05-07T19:45:25.4730405Z 2025-05-07T19:45:25.4730408Z 2025-05-07T19:45:25.4730412Z 2025-05-07T19:45:25.4730415Z 2025-05-07T19:45:25.4730418Z 2025-05-07T19:45:25.4730589Z  2025-05-07T19:45:25.4730811Z 2025-05-07T19:45:25.4730815Z 2025-05-07T19:45:25.4730818Z 2025-05-07T19:45:25.4730822Z 2025-05-07T19:45:25.4730829Z 2025-05-07T19:45:25.4730832Z 2025-05-07T19:45:25.4730835Z 2025-05-07T19:45:25.4730839Z 2025-05-07T19:45:25.4730842Z 2025-05-07T19:45:25.4730845Z 2025-05-07T19:45:25.4730849Z 2025-05-07T19:45:25.4730852Z 2025-05-07T19:45:25.4730855Z 2025-05-07T19:45:25.4730859Z 2025-05-07T19:45:25.4730888Z 2025-05-07T19:45:25.4730891Z 2025-05-07T19:45:25.4731062Z  2025-05-07T19:45:25.4731291Z 2025-05-07T19:45:25.4731295Z 2025-05-07T19:45:25.4731299Z 2025-05-07T19:45:25.4731303Z 2025-05-07T19:45:25.4731363Z 2025-05-07T19:45:25.4731367Z 2025-05-07T19:45:25.4731371Z 2025-05-07T19:45:25.4731374Z 2025-05-07T19:45:25.4731377Z 2025-05-07T19:45:25.4731409Z 2025-05-07T19:45:25.4731412Z 2025-05-07T19:45:25.4731416Z 2025-05-07T19:45:25.4731419Z 2025-05-07T19:45:25.4731423Z 2025-05-07T19:45:25.4731426Z 2025-05-07T19:45:25.4731429Z 2025-05-07T19:45:25.4731433Z 2025-05-07T19:45:25.4731599Z  2025-05-07T19:45:25.4731840Z 2025-05-07T19:45:25.4731843Z 2025-05-07T19:45:25.4731873Z 2025-05-07T19:45:25.4731877Z 2025-05-07T19:45:25.4731880Z 2025-05-07T19:45:25.4731883Z 2025-05-07T19:45:25.4731887Z 2025-05-07T19:45:25.4731890Z 2025-05-07T19:45:25.4731894Z 2025-05-07T19:45:25.4731897Z 2025-05-07T19:45:25.4731900Z 2025-05-07T19:45:25.4731904Z 2025-05-07T19:45:25.4731907Z 2025-05-07T19:45:25.4731911Z 2025-05-07T19:45:25.4731914Z 2025-05-07T19:45:25.4731917Z 2025-05-07T19:45:25.4731921Z 2025-05-07T19:45:25.4731924Z 2025-05-07T19:45:25.4732106Z  2025-05-07T19:45:25.4732367Z 2025-05-07T19:45:25.4732370Z 2025-05-07T19:45:25.4732479Z  2025-05-07T19:45:25.4732596Z 2025-05-07T19:45:25.4732600Z 2025-05-07T19:45:25.4732737Z  2025-05-07T19:45:25.4732860Z 2025-05-07T19:45:25.4732864Z 2025-05-07T19:45:25.4732867Z 2025-05-07T19:45:25.4732979Z  2025-05-07T19:45:25.4733134Z 2025-05-07T19:45:25.4733138Z 2025-05-07T19:45:25.4733141Z 2025-05-07T19:45:25.4733200Z 2025-05-07T19:45:25.4733318Z  2025-05-07T19:45:25.4733447Z 2025-05-07T19:45:25.4733451Z 2025-05-07T19:45:25.4733454Z 2025-05-07T19:45:25.4733458Z 2025-05-07T19:45:25.4733461Z 2025-05-07T19:45:25.4733607Z  2025-05-07T19:45:25.4733751Z 2025-05-07T19:45:25.4733755Z 2025-05-07T19:45:25.4733758Z 2025-05-07T19:45:25.4733762Z 2025-05-07T19:45:25.4733765Z 2025-05-07T19:45:25.4733768Z 2025-05-07T19:45:25.4733922Z  2025-05-07T19:45:25.4734066Z 2025-05-07T19:45:25.4734069Z 2025-05-07T19:45:25.4734076Z 2025-05-07T19:45:25.4734079Z 2025-05-07T19:45:25.4734083Z 2025-05-07T19:45:25.4734086Z 2025-05-07T19:45:25.4734090Z 2025-05-07T19:45:25.4734217Z  2025-05-07T19:45:25.4734378Z 2025-05-07T19:45:25.4734382Z 2025-05-07T19:45:25.4734386Z 2025-05-07T19:45:25.4734389Z 2025-05-07T19:45:25.4734392Z 2025-05-07T19:45:25.4734396Z 2025-05-07T19:45:25.4734399Z 2025-05-07T19:45:25.4734403Z 2025-05-07T19:45:25.4734533Z  2025-05-07T19:45:25.4734708Z 2025-05-07T19:45:25.4734712Z 2025-05-07T19:45:25.4734715Z 2025-05-07T19:45:25.4734719Z 2025-05-07T19:45:25.4734722Z 2025-05-07T19:45:25.4734725Z 2025-05-07T19:45:25.4734729Z 2025-05-07T19:45:25.4734732Z 2025-05-07T19:45:25.4734736Z 2025-05-07T19:45:25.4734890Z  2025-05-07T19:45:25.4735065Z 2025-05-07T19:45:25.4735069Z 2025-05-07T19:45:25.4735072Z 2025-05-07T19:45:25.4735076Z 2025-05-07T19:45:25.4735079Z 2025-05-07T19:45:25.4735082Z 2025-05-07T19:45:25.4735086Z 2025-05-07T19:45:25.4735092Z 2025-05-07T19:45:25.4735096Z 2025-05-07T19:45:25.4735099Z 2025-05-07T19:45:25.4735271Z  2025-05-07T19:45:25.4735448Z 2025-05-07T19:45:25.4735452Z 2025-05-07T19:45:25.4735456Z 2025-05-07T19:45:25.4735460Z 2025-05-07T19:45:25.4735464Z 2025-05-07T19:45:25.4735467Z 2025-05-07T19:45:25.4735470Z 2025-05-07T19:45:25.4735474Z 2025-05-07T19:45:25.4735477Z 2025-05-07T19:45:25.4735480Z 2025-05-07T19:45:25.4735487Z 2025-05-07T19:45:25.4735655Z  2025-05-07T19:45:25.4735843Z 2025-05-07T19:45:25.4735847Z 2025-05-07T19:45:25.4735851Z 2025-05-07T19:45:25.4735855Z 2025-05-07T19:45:25.4735858Z 2025-05-07T19:45:25.4735861Z 2025-05-07T19:45:25.4735865Z 2025-05-07T19:45:25.4735868Z 2025-05-07T19:45:25.4735872Z 2025-05-07T19:45:25.4735875Z 2025-05-07T19:45:25.4735878Z 2025-05-07T19:45:25.4735882Z 2025-05-07T19:45:25.4736052Z  2025-05-07T19:45:25.4736255Z 2025-05-07T19:45:25.4736259Z 2025-05-07T19:45:25.4736323Z 2025-05-07T19:45:25.4736327Z 2025-05-07T19:45:25.4736330Z 2025-05-07T19:45:25.4736334Z 2025-05-07T19:45:25.4736337Z 2025-05-07T19:45:25.4736341Z 2025-05-07T19:45:25.4736344Z 2025-05-07T19:45:25.4736348Z 2025-05-07T19:45:25.4736351Z 2025-05-07T19:45:25.4736355Z 2025-05-07T19:45:25.4736358Z 2025-05-07T19:45:25.4736530Z  2025-05-07T19:45:25.4736740Z 2025-05-07T19:45:25.4736743Z 2025-05-07T19:45:25.4736750Z 2025-05-07T19:45:25.4736754Z 2025-05-07T19:45:25.4736758Z 2025-05-07T19:45:25.4736761Z 2025-05-07T19:45:25.4736764Z 2025-05-07T19:45:25.4736767Z 2025-05-07T19:45:25.4736771Z 2025-05-07T19:45:25.4736775Z 2025-05-07T19:45:25.4736778Z 2025-05-07T19:45:25.4736782Z 2025-05-07T19:45:25.4736786Z 2025-05-07T19:45:25.4736814Z 2025-05-07T19:45:25.4736966Z  2025-05-07T19:45:25.4737327Z 2025-05-07T19:45:25.4737331Z 2025-05-07T19:45:25.4737335Z 2025-05-07T19:45:25.4737338Z 2025-05-07T19:45:25.4737345Z 2025-05-07T19:45:25.4737349Z 2025-05-07T19:45:25.4737353Z 2025-05-07T19:45:25.4737357Z 2025-05-07T19:45:25.4737361Z 2025-05-07T19:45:25.4737364Z 2025-05-07T19:45:25.4737393Z 2025-05-07T19:45:25.4737397Z 2025-05-07T19:45:25.4737400Z 2025-05-07T19:45:25.4737403Z 2025-05-07T19:45:25.4737407Z 2025-05-07T19:45:25.4737575Z  2025-05-07T19:45:25.4737798Z 2025-05-07T19:45:25.4737802Z 2025-05-07T19:45:25.4737872Z 2025-05-07T19:45:25.4737875Z 2025-05-07T19:45:25.4737879Z 2025-05-07T19:45:25.4737909Z 2025-05-07T19:45:25.4737913Z 2025-05-07T19:45:25.4737916Z 2025-05-07T19:45:25.4737919Z 2025-05-07T19:45:25.4737923Z 2025-05-07T19:45:25.4737926Z 2025-05-07T19:45:25.4737929Z 2025-05-07T19:45:25.4737933Z 2025-05-07T19:45:25.4737936Z 2025-05-07T19:45:25.4737939Z 2025-05-07T19:45:25.4737943Z 2025-05-07T19:45:25.4738134Z  2025-05-07T19:45:25.4738386Z 2025-05-07T19:45:25.4738390Z 2025-05-07T19:45:25.4738397Z 2025-05-07T19:45:25.4738400Z 2025-05-07T19:45:25.4738404Z 2025-05-07T19:45:25.4738407Z 2025-05-07T19:45:25.4738411Z 2025-05-07T19:45:25.4738414Z 2025-05-07T19:45:25.4738417Z 2025-05-07T19:45:25.4738421Z 2025-05-07T19:45:25.4738424Z 2025-05-07T19:45:25.4738427Z 2025-05-07T19:45:25.4738431Z 2025-05-07T19:45:25.4738434Z 2025-05-07T19:45:25.4738438Z 2025-05-07T19:45:25.4738441Z 2025-05-07T19:45:25.4738444Z 2025-05-07T19:45:25.4738613Z  2025-05-07T19:45:25.4738872Z 2025-05-07T19:45:25.4738876Z 2025-05-07T19:45:25.4738879Z 2025-05-07T19:45:25.4738883Z 2025-05-07T19:45:25.4738886Z 2025-05-07T19:45:25.4738890Z 2025-05-07T19:45:25.4738893Z 2025-05-07T19:45:25.4738896Z 2025-05-07T19:45:25.4738900Z 2025-05-07T19:45:25.4738903Z 2025-05-07T19:45:25.4738907Z 2025-05-07T19:45:25.4738910Z 2025-05-07T19:45:25.4738913Z 2025-05-07T19:45:25.4738917Z 2025-05-07T19:45:25.4738920Z 2025-05-07T19:45:25.4738924Z 2025-05-07T19:45:25.4738927Z 2025-05-07T19:45:25.4738958Z 2025-05-07T19:45:25.4739132Z  2025-05-07T19:45:25.4739371Z 2025-05-07T19:45:25.4739374Z 2025-05-07T19:45:25.4739480Z  2025-05-07T19:45:25.4739627Z 2025-05-07T19:45:25.4739631Z 2025-05-07T19:45:25.4739736Z  2025-05-07T19:45:25.4739854Z 2025-05-07T19:45:25.4739858Z 2025-05-07T19:45:25.4739861Z 2025-05-07T19:45:25.4739995Z  2025-05-07T19:45:25.4740117Z 2025-05-07T19:45:25.4740125Z 2025-05-07T19:45:25.4740128Z 2025-05-07T19:45:25.4740132Z 2025-05-07T19:45:25.4740243Z  2025-05-07T19:45:25.4740396Z 2025-05-07T19:45:25.4740400Z 2025-05-07T19:45:25.4740404Z 2025-05-07T19:45:25.4740408Z 2025-05-07T19:45:25.4740411Z 2025-05-07T19:45:25.4740525Z  2025-05-07T19:45:25.4740662Z 2025-05-07T19:45:25.4740666Z 2025-05-07T19:45:25.4740693Z 2025-05-07T19:45:25.4740696Z 2025-05-07T19:45:25.4740700Z 2025-05-07T19:45:25.4740703Z 2025-05-07T19:45:25.4740821Z  2025-05-07T19:45:25.4741022Z 2025-05-07T19:45:25.4741027Z 2025-05-07T19:45:25.4741030Z 2025-05-07T19:45:25.4741033Z 2025-05-07T19:45:25.4741037Z 2025-05-07T19:45:25.4741040Z 2025-05-07T19:45:25.4741044Z 2025-05-07T19:45:25.4741194Z  2025-05-07T19:45:25.4741345Z 2025-05-07T19:45:25.4741349Z 2025-05-07T19:45:25.4741352Z 2025-05-07T19:45:25.4741356Z 2025-05-07T19:45:25.4741359Z 2025-05-07T19:45:25.4741362Z 2025-05-07T19:45:25.4741370Z 2025-05-07T19:45:25.4741373Z 2025-05-07T19:45:25.4741522Z  2025-05-07T19:45:25.4741688Z 2025-05-07T19:45:25.4741691Z 2025-05-07T19:45:25.4741695Z 2025-05-07T19:45:25.4741698Z 2025-05-07T19:45:25.4741702Z 2025-05-07T19:45:25.4741705Z 2025-05-07T19:45:25.4741708Z 2025-05-07T19:45:25.4741712Z 2025-05-07T19:45:25.4741715Z 2025-05-07T19:45:25.4741877Z  2025-05-07T19:45:25.4742049Z 2025-05-07T19:45:25.4742053Z 2025-05-07T19:45:25.4742056Z 2025-05-07T19:45:25.4742060Z 2025-05-07T19:45:25.4742063Z 2025-05-07T19:45:25.4742070Z 2025-05-07T19:45:25.4742074Z 2025-05-07T19:45:25.4742077Z 2025-05-07T19:45:25.4742081Z 2025-05-07T19:45:25.4742084Z 2025-05-07T19:45:25.4742344Z  2025-05-07T19:45:25.4742527Z 2025-05-07T19:45:25.4742531Z 2025-05-07T19:45:25.4742534Z 2025-05-07T19:45:25.4742538Z 2025-05-07T19:45:25.4742542Z 2025-05-07T19:45:25.4742545Z 2025-05-07T19:45:25.4742548Z 2025-05-07T19:45:25.4742552Z 2025-05-07T19:45:25.4744001Z 2025-05-07T19:45:25.4744004Z 2025-05-07T19:45:25.4744007Z 2025-05-07T19:45:25.4744202Z  2025-05-07T19:45:25.4744406Z 2025-05-07T19:45:25.4744410Z 2025-05-07T19:45:25.4744413Z 2025-05-07T19:45:25.4744417Z 2025-05-07T19:45:25.4744421Z 2025-05-07T19:45:25.4744424Z 2025-05-07T19:45:25.4744428Z 2025-05-07T19:45:25.4744431Z 2025-05-07T19:45:25.4744435Z 2025-05-07T19:45:25.4744438Z 2025-05-07T19:45:25.4744442Z 2025-05-07T19:45:25.4744445Z 2025-05-07T19:45:25.4744629Z  2025-05-07T19:45:25.4744831Z 2025-05-07T19:45:25.4744834Z 2025-05-07T19:45:25.4744839Z 2025-05-07T19:45:25.4744842Z 2025-05-07T19:45:25.4744846Z 2025-05-07T19:45:25.4744850Z 2025-05-07T19:45:25.4744853Z 2025-05-07T19:45:25.4744858Z 2025-05-07T19:45:25.4744861Z 2025-05-07T19:45:25.4744866Z 2025-05-07T19:45:25.4744869Z 2025-05-07T19:45:25.4744872Z 2025-05-07T19:45:25.4744876Z 2025-05-07T19:45:25.4745057Z  2025-05-07T19:45:25.4745275Z 2025-05-07T19:45:25.4745278Z 2025-05-07T19:45:25.4745282Z 2025-05-07T19:45:25.4745286Z 2025-05-07T19:45:25.4745289Z 2025-05-07T19:45:25.4745292Z 2025-05-07T19:45:25.4745296Z 2025-05-07T19:45:25.4745299Z 2025-05-07T19:45:25.4745303Z 2025-05-07T19:45:25.4745306Z 2025-05-07T19:45:25.4745310Z 2025-05-07T19:45:25.4745313Z 2025-05-07T19:45:25.4745342Z 2025-05-07T19:45:25.4745346Z 2025-05-07T19:45:25.4745502Z  2025-05-07T19:45:25.4745717Z 2025-05-07T19:45:25.4745720Z 2025-05-07T19:45:25.4745728Z 2025-05-07T19:45:25.4745731Z 2025-05-07T19:45:25.4745734Z 2025-05-07T19:45:25.4745738Z 2025-05-07T19:45:25.4745741Z 2025-05-07T19:45:25.4745745Z 2025-05-07T19:45:25.4745748Z 2025-05-07T19:45:25.4745777Z 2025-05-07T19:45:25.4745780Z 2025-05-07T19:45:25.4745784Z 2025-05-07T19:45:25.4745787Z 2025-05-07T19:45:25.4745791Z 2025-05-07T19:45:25.4745794Z 2025-05-07T19:45:25.4745954Z  2025-05-07T19:45:25.4746178Z 2025-05-07T19:45:25.4746182Z 2025-05-07T19:45:25.4746185Z 2025-05-07T19:45:25.4746188Z 2025-05-07T19:45:25.4746215Z 2025-05-07T19:45:25.4746219Z 2025-05-07T19:45:25.4746223Z 2025-05-07T19:45:25.4746226Z 2025-05-07T19:45:25.4746229Z 2025-05-07T19:45:25.4746233Z 2025-05-07T19:45:25.4746236Z 2025-05-07T19:45:25.4746239Z 2025-05-07T19:45:25.4746243Z 2025-05-07T19:45:25.4746246Z 2025-05-07T19:45:25.4746250Z 2025-05-07T19:45:25.4746253Z 2025-05-07T19:45:25.4746415Z  2025-05-07T19:45:25.4746730Z 2025-05-07T19:45:25.4746735Z 2025-05-07T19:45:25.4746738Z 2025-05-07T19:45:25.4746742Z 2025-05-07T19:45:25.4746745Z 2025-05-07T19:45:25.4746749Z 2025-05-07T19:45:25.4746752Z 2025-05-07T19:45:25.4746755Z 2025-05-07T19:45:25.4746759Z 2025-05-07T19:45:25.4746762Z 2025-05-07T19:45:25.4746766Z 2025-05-07T19:45:25.4746769Z 2025-05-07T19:45:25.4746772Z 2025-05-07T19:45:25.4746776Z 2025-05-07T19:45:25.4746780Z 2025-05-07T19:45:25.4746786Z 2025-05-07T19:45:25.4746790Z 2025-05-07T19:45:25.4746989Z  2025-05-07T19:45:25.4747222Z 2025-05-07T19:45:25.4747226Z 2025-05-07T19:45:25.4747229Z 2025-05-07T19:45:25.4747233Z 2025-05-07T19:45:25.4747236Z 2025-05-07T19:45:25.4747239Z 2025-05-07T19:45:25.4747243Z 2025-05-07T19:45:25.4747247Z 2025-05-07T19:45:25.4747250Z 2025-05-07T19:45:25.4747253Z 2025-05-07T19:45:25.4747257Z 2025-05-07T19:45:25.4747260Z 2025-05-07T19:45:25.4747264Z 2025-05-07T19:45:25.4747267Z 2025-05-07T19:45:25.4747274Z 2025-05-07T19:45:25.4747278Z 2025-05-07T19:45:25.4747309Z 2025-05-07T19:45:25.4747312Z 2025-05-07T19:45:25.4747488Z  2025-05-07T19:45:25.4747724Z 2025-05-07T19:45:25.4747728Z 2025-05-07T19:45:25.4747835Z  2025-05-07T19:45:25.4747980Z 2025-05-07T19:45:25.4747984Z 2025-05-07T19:45:25.4748091Z  2025-05-07T19:45:25.4748211Z 2025-05-07T19:45:25.4748214Z 2025-05-07T19:45:25.4748218Z 2025-05-07T19:45:25.4748413Z  2025-05-07T19:45:25.4748539Z 2025-05-07T19:45:25.4748542Z 2025-05-07T19:45:25.4748546Z 2025-05-07T19:45:25.4748549Z 2025-05-07T19:45:25.4748702Z  2025-05-07T19:45:25.4748863Z 2025-05-07T19:45:25.4748867Z 2025-05-07T19:45:25.4748870Z 2025-05-07T19:45:25.4748874Z 2025-05-07T19:45:25.4748877Z 2025-05-07T19:45:25.4748996Z  2025-05-07T19:45:25.4749135Z 2025-05-07T19:45:25.4749164Z 2025-05-07T19:45:25.4749167Z 2025-05-07T19:45:25.4749171Z 2025-05-07T19:45:25.4749175Z 2025-05-07T19:45:25.4749181Z 2025-05-07T19:45:25.4749302Z  2025-05-07T19:45:25.4749445Z 2025-05-07T19:45:25.4749448Z 2025-05-07T19:45:25.4749452Z 2025-05-07T19:45:25.4749455Z 2025-05-07T19:45:25.4749458Z 2025-05-07T19:45:25.4749462Z 2025-05-07T19:45:25.4749491Z 2025-05-07T19:45:25.4749620Z  2025-05-07T19:45:25.4749892Z 2025-05-07T19:45:25.4749895Z 2025-05-07T19:45:25.4749899Z 2025-05-07T19:45:25.4749902Z 2025-05-07T19:45:25.4749908Z 2025-05-07T19:45:25.4749911Z 2025-05-07T19:45:25.4749915Z 2025-05-07T19:45:25.4749918Z 2025-05-07T19:45:25.4750075Z  2025-05-07T19:45:25.4750225Z 2025-05-07T19:45:25.4750229Z 2025-05-07T19:45:25.4750232Z 2025-05-07T19:45:25.4750235Z 2025-05-07T19:45:25.4750238Z 2025-05-07T19:45:25.4750241Z 2025-05-07T19:45:25.4750245Z 2025-05-07T19:45:25.4750248Z 2025-05-07T19:45:25.4750251Z 2025-05-07T19:45:25.4750399Z  2025-05-07T19:45:25.4750558Z 2025-05-07T19:45:25.4750561Z 2025-05-07T19:45:25.4750568Z 2025-05-07T19:45:25.4750571Z 2025-05-07T19:45:25.4750575Z 2025-05-07T19:45:25.4750578Z 2025-05-07T19:45:25.4750581Z 2025-05-07T19:45:25.4750584Z 2025-05-07T19:45:25.4750587Z 2025-05-07T19:45:25.4750591Z 2025-05-07T19:45:25.4750745Z  2025-05-07T19:45:25.4750911Z 2025-05-07T19:45:25.4750915Z 2025-05-07T19:45:25.4750918Z 2025-05-07T19:45:25.4750922Z 2025-05-07T19:45:25.4750925Z 2025-05-07T19:45:25.4750928Z 2025-05-07T19:45:25.4750935Z 2025-05-07T19:45:25.4750938Z 2025-05-07T19:45:25.4750941Z 2025-05-07T19:45:25.4750944Z 2025-05-07T19:45:25.4750947Z 2025-05-07T19:45:25.4751104Z  2025-05-07T19:45:25.4751281Z 2025-05-07T19:45:25.4751286Z 2025-05-07T19:45:25.4751289Z 2025-05-07T19:45:25.4751292Z 2025-05-07T19:45:25.4751295Z 2025-05-07T19:45:25.4751298Z 2025-05-07T19:45:25.4751301Z 2025-05-07T19:45:25.4751304Z 2025-05-07T19:45:25.4751307Z 2025-05-07T19:45:25.4751311Z 2025-05-07T19:45:25.4751314Z 2025-05-07T19:45:25.4751373Z 2025-05-07T19:45:25.4751533Z  2025-05-07T19:45:25.4751721Z 2025-05-07T19:45:25.4751725Z 2025-05-07T19:45:25.4751728Z 2025-05-07T19:45:25.4751731Z 2025-05-07T19:45:25.4751735Z 2025-05-07T19:45:25.4751738Z 2025-05-07T19:45:25.4751741Z 2025-05-07T19:45:25.4751744Z 2025-05-07T19:45:25.4751747Z 2025-05-07T19:45:25.4751750Z 2025-05-07T19:45:25.4751753Z 2025-05-07T19:45:25.4751756Z 2025-05-07T19:45:25.4751764Z 2025-05-07T19:45:25.4751927Z  2025-05-07T19:45:25.4752120Z 2025-05-07T19:45:25.4752123Z 2025-05-07T19:45:25.4752127Z 2025-05-07T19:45:25.4752130Z 2025-05-07T19:45:25.4752133Z 2025-05-07T19:45:25.4752136Z 2025-05-07T19:45:25.4752139Z 2025-05-07T19:45:25.4752143Z 2025-05-07T19:45:25.4752146Z 2025-05-07T19:45:25.4752149Z 2025-05-07T19:45:25.4752153Z 2025-05-07T19:45:25.4752156Z 2025-05-07T19:45:25.4752185Z 2025-05-07T19:45:25.4752188Z 2025-05-07T19:45:25.4752327Z  2025-05-07T19:45:25.4752529Z 2025-05-07T19:45:25.4752533Z 2025-05-07T19:45:25.4752536Z 2025-05-07T19:45:25.4752539Z 2025-05-07T19:45:25.4752543Z 2025-05-07T19:45:25.4752546Z 2025-05-07T19:45:25.4752550Z 2025-05-07T19:45:25.4752553Z 2025-05-07T19:45:25.4752557Z 2025-05-07T19:45:25.4752584Z 2025-05-07T19:45:25.4752587Z 2025-05-07T19:45:25.4752590Z 2025-05-07T19:45:25.4752593Z 2025-05-07T19:45:25.4752597Z 2025-05-07T19:45:25.4752600Z 2025-05-07T19:45:25.4752801Z  2025-05-07T19:45:25.4753004Z 2025-05-07T19:45:25.4753008Z 2025-05-07T19:45:25.4753011Z 2025-05-07T19:45:25.4753015Z 2025-05-07T19:45:25.4753042Z 2025-05-07T19:45:25.4753045Z 2025-05-07T19:45:25.4753048Z 2025-05-07T19:45:25.4753051Z 2025-05-07T19:45:25.4753055Z 2025-05-07T19:45:25.4753058Z 2025-05-07T19:45:25.4753061Z 2025-05-07T19:45:25.4753065Z 2025-05-07T19:45:25.4753068Z 2025-05-07T19:45:25.4753071Z 2025-05-07T19:45:25.4753074Z 2025-05-07T19:45:25.4753078Z 2025-05-07T19:45:25.4753233Z  2025-05-07T19:45:25.4753468Z 2025-05-07T19:45:25.4753471Z 2025-05-07T19:45:25.4753475Z 2025-05-07T19:45:25.4753478Z 2025-05-07T19:45:25.4753481Z 2025-05-07T19:45:25.4753484Z 2025-05-07T19:45:25.4753487Z 2025-05-07T19:45:25.4753491Z 2025-05-07T19:45:25.4753495Z 2025-05-07T19:45:25.4753498Z 2025-05-07T19:45:25.4753501Z 2025-05-07T19:45:25.4753504Z 2025-05-07T19:45:25.4753507Z 2025-05-07T19:45:25.4753513Z 2025-05-07T19:45:25.4753516Z 2025-05-07T19:45:25.4753520Z 2025-05-07T19:45:25.4753523Z 2025-05-07T19:45:25.4753705Z  2025-05-07T19:45:25.4753920Z 2025-05-07T19:45:25.4753924Z 2025-05-07T19:45:25.4753928Z 2025-05-07T19:45:25.4753931Z 2025-05-07T19:45:25.4753934Z 2025-05-07T19:45:25.4753937Z 2025-05-07T19:45:25.4753940Z 2025-05-07T19:45:25.4753943Z 2025-05-07T19:45:25.4753946Z 2025-05-07T19:45:25.4753949Z 2025-05-07T19:45:25.4753952Z 2025-05-07T19:45:25.4753956Z 2025-05-07T19:45:25.4753962Z 2025-05-07T19:45:25.4753966Z 2025-05-07T19:45:25.4753970Z 2025-05-07T19:45:25.4753974Z 2025-05-07T19:45:25.4754002Z 2025-05-07T19:45:25.4754005Z 2025-05-07T19:45:25.4754169Z  2025-05-07T19:45:25.4754395Z 2025-05-07T19:45:25.4754399Z 2025-05-07T19:45:25.4754502Z  2025-05-07T19:45:25.4754642Z 2025-05-07T19:45:25.4754645Z 2025-05-07T19:45:25.4754745Z  2025-05-07T19:45:25.4754861Z 2025-05-07T19:45:25.4754864Z 2025-05-07T19:45:25.4754868Z 2025-05-07T19:45:25.4754997Z  2025-05-07T19:45:25.4755111Z 2025-05-07T19:45:25.4755114Z 2025-05-07T19:45:25.4755117Z 2025-05-07T19:45:25.4755120Z 2025-05-07T19:45:25.4755236Z  done 2025-05-07T19:45:25.7901135Z Preparing transaction: | / - done 2025-05-07T19:45:29.3530543Z Verifying transaction: | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | done 2025-05-07T19:45:32.0820452Z Executing transaction: - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | done 2025-05-07T19:45:32.5199770Z [INSTALL] Adding symlink librhash.so.0, which is needed by CMake ... 2025-05-07T19:45:34.3908306Z + ln -s /github/home/miniconda/envs/build_binary/lib/librhash.so /github/home/miniconda/envs/build_binary/lib/librhash.so.0 2025-05-07T19:45:34.3909781Z 2025-05-07T19:45:34.3922947Z 2025-05-07T19:45:34.3953961Z [EXEC] [ATTEMPT 0/3] + conda run -n build_binary pip install build 2025-05-07T19:45:36.9108833Z WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-05-07T19:45:36.9110472Z 2025-05-07T19:45:36.9110609Z Collecting build 2025-05-07T19:45:36.9111194Z Downloading build-1.2.2.post1-py3-none-any.whl.metadata (6.5 kB) 2025-05-07T19:45:36.9112173Z Requirement already satisfied: packaging>=19.1 in /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages (from build) (25.0) 2025-05-07T19:45:36.9112914Z Collecting pyproject_hooks (from build) 2025-05-07T19:45:36.9113355Z Downloading pyproject_hooks-1.2.0-py3-none-any.whl.metadata (1.3 kB) 2025-05-07T19:45:36.9113862Z Collecting importlib-metadata>=4.6 (from build) 2025-05-07T19:45:36.9114749Z Downloading importlib_metadata-8.7.0-py3-none-any.whl.metadata (4.8 kB) 2025-05-07T19:45:36.9115551Z Requirement already satisfied: tomli>=1.1.0 in /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages (from build) (2.2.1) 2025-05-07T19:45:36.9116324Z Collecting zipp>=3.20 (from importlib-metadata>=4.6->build) 2025-05-07T19:45:36.9116789Z Downloading zipp-3.21.0-py3-none-any.whl.metadata (3.7 kB) 2025-05-07T19:45:36.9117265Z Downloading build-1.2.2.post1-py3-none-any.whl (22 kB) 2025-05-07T19:45:36.9117734Z Downloading importlib_metadata-8.7.0-py3-none-any.whl (27 kB) 2025-05-07T19:45:36.9118191Z Downloading zipp-3.21.0-py3-none-any.whl (9.6 kB) 2025-05-07T19:45:36.9118641Z Downloading pyproject_hooks-1.2.0-py3-none-any.whl (10 kB) 2025-05-07T19:45:36.9119174Z Installing collected packages: zipp, pyproject_hooks, importlib-metadata, build 2025-05-07T19:45:36.9119580Z 2025-05-07T19:45:36.9119926Z Successfully installed build-1.2.2.post1 importlib-metadata-8.7.0 pyproject_hooks-1.2.0 zipp-3.21.0 2025-05-07T19:45:36.9120376Z 2025-05-07T19:45:38.7879037Z /github/home/miniconda/envs/build_binary/bin/make 2025-05-07T19:45:38.7879870Z 2025-05-07T19:45:38.8461216Z [CHECK] Binary make found in PATH 2025-05-07T19:45:40.6536168Z /github/home/miniconda/envs/build_binary/bin/cmake 2025-05-07T19:45:40.6536504Z 2025-05-07T19:45:40.7277689Z [CHECK] Binary cmake found in PATH 2025-05-07T19:45:42.5334600Z /github/home/miniconda/envs/build_binary/bin/ninja 2025-05-07T19:45:42.5334990Z 2025-05-07T19:45:42.5902751Z [CHECK] Binary ninja found in PATH 2025-05-07T19:45:44.5204895Z [CHECK] Python (sub-)package 'click' found ... 2025-05-07T19:45:46.5700604Z [CHECK] Python (sub-)package 'hypothesis' found ... 2025-05-07T19:45:48.5183765Z [CHECK] Python (sub-)package 'jinja2' found ... 2025-05-07T19:45:50.5252550Z [CHECK] Python (sub-)package 'skbuild' found ... 2025-05-07T19:45:52.3904601Z [CHECK] Python (sub-)package 'wheel' found ... 2025-05-07T19:45:52.3905828Z [INSTALL] Successfully installed all the build tools 2025-05-07T19:45:52.3983628Z ##[group]Run . $PRELUDE; install_cuda $BUILD_ENV 12.8.0 2025-05-07T19:45:52.3984087Z . $PRELUDE; install_cuda $BUILD_ENV 12.8.0 2025-05-07T19:45:52.3984771Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:45:52.3985159Z env: 2025-05-07T19:45:52.3985398Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:45:52.3985721Z BUILD_ENV: build_binary 2025-05-07T19:45:52.3985969Z BUILD_TARGET: genai 2025-05-07T19:45:52.3986210Z BUILD_VARIANT: cuda 2025-05-07T19:45:52.3986479Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:45:52.3986770Z ##[endgroup] 2025-05-07T19:45:52.8516548Z ################################################################################ 2025-05-07T19:45:52.8517151Z # Install CUDA 2025-05-07T19:45:52.8518290Z # 2025-05-07T19:45:52.8530428Z # [2025-05-07T19:45:52.852Z] + install_cuda build_binary 12.8.0 2025-05-07T19:45:52.8531688Z ################################################################################ 2025-05-07T19:45:52.8532542Z 2025-05-07T19:45:52.8545874Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:45:52.9396285Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:45:52.9397351Z [SETUP] Cleaning up Conda packages ... 2025-05-07T19:45:52.9403828Z + conda clean --packages --tarball -y 2025-05-07T19:45:52.9404045Z 2025-05-07T19:45:53.4905057Z Will remove 148 (613.1 MB) tarball(s). 2025-05-07T19:45:53.4905482Z Will remove 21 (76.2 MB) package(s). 2025-05-07T19:45:53.5477803Z 2025-05-07T19:45:53.5495526Z + conda clean --all -y 2025-05-07T19:45:53.5495802Z 2025-05-07T19:45:54.1536067Z There are no unused tarball(s) to remove. 2025-05-07T19:45:54.1537008Z Will remove 1 index cache(s). 2025-05-07T19:45:54.1537490Z There are no unused package(s) to remove. 2025-05-07T19:45:54.1537821Z There are no tempfile(s) to remove. 2025-05-07T19:45:54.1538149Z There are no logfile(s) to remove. 2025-05-07T19:45:54.2085592Z 2025-05-07T19:45:54.2092735Z [INSTALL] Installing CUDA 12.8.0 ... 2025-05-07T19:45:54.2122520Z [EXEC] [ATTEMPT 0/3] + conda install --force-reinstall -n build_binary -c conda-forge --override-channels -y cuda=12.8.0 2025-05-07T19:45:55.0380847Z Channels: 2025-05-07T19:45:55.0381515Z - conda-forge 2025-05-07T19:45:55.0382145Z Platform: linux-64 2025-05-07T19:46:04.6453414Z Collecting package metadata (repodata.json): - \ | / - \ | / - \ | / - \ | / done 2025-05-07T19:46:06.2110143Z Solving environment: \ | / - done 2025-05-07T19:46:06.3473687Z 2025-05-07T19:46:06.3474620Z ## Package Plan ## 2025-05-07T19:46:06.3474880Z 2025-05-07T19:46:06.3475117Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:46:06.3475493Z 2025-05-07T19:46:06.3475617Z added / updated specs: 2025-05-07T19:46:06.3475915Z - cuda=12.8.0 2025-05-07T19:46:06.3476101Z 2025-05-07T19:46:06.3476107Z 2025-05-07T19:46:06.3476245Z The following packages will be downloaded: 2025-05-07T19:46:06.3476529Z 2025-05-07T19:46:06.3476707Z package | build 2025-05-07T19:46:06.3477073Z ---------------------------|----------------- 2025-05-07T19:46:06.3477497Z attr-2.5.1 | h166bdaf_1 69 KB conda-forge 2025-05-07T19:46:06.3477966Z binutils-2.40 | h4852527_7 31 KB conda-forge 2025-05-07T19:46:06.3478464Z c-compiler-1.5.2 | h0b41bf4_0 6 KB conda-forge 2025-05-07T19:46:06.3478949Z cuda-12.8.0 | ha804496_0 26 KB conda-forge 2025-05-07T19:46:06.3479422Z cuda-cccl_linux-64-12.8.55 | ha770c72_1 1.0 MB conda-forge 2025-05-07T19:46:06.3480086Z cuda-command-line-tools-12.8.0| ha770c72_0 20 KB conda-forge 2025-05-07T19:46:06.3480676Z cuda-compiler-12.8.0 | hbad6d8a_0 20 KB conda-forge 2025-05-07T19:46:06.3481204Z cuda-crt-dev_linux-64-12.8.61| ha770c72_1 90 KB conda-forge 2025-05-07T19:46:06.3482120Z cuda-crt-tools-12.8.61 | ha770c72_1 27 KB conda-forge 2025-05-07T19:46:06.3482631Z cuda-cudart-12.8.57 | h5888daf_1 22 KB conda-forge 2025-05-07T19:46:06.3483170Z cuda-cudart-dev-12.8.57 | h5888daf_1 23 KB conda-forge 2025-05-07T19:46:06.3483718Z cuda-cudart-dev_linux-64-12.8.57| h3f2d84a_1 377 KB conda-forge 2025-05-07T19:46:06.3484311Z cuda-cudart-static-12.8.57 | h5888daf_1 22 KB conda-forge 2025-05-07T19:46:06.3484909Z cuda-cudart-static_linux-64-12.8.57| h3f2d84a_1 950 KB conda-forge 2025-05-07T19:46:06.3485479Z cuda-cudart_linux-64-12.8.57| h3f2d84a_1 188 KB conda-forge 2025-05-07T19:46:06.3486028Z cuda-cuobjdump-12.8.55 | hbd13f7d_0 227 KB conda-forge 2025-05-07T19:46:06.3486525Z cuda-cupti-12.8.57 | hbd13f7d_0 1.8 MB conda-forge 2025-05-07T19:46:06.3487056Z cuda-cupti-dev-12.8.57 | h5888daf_0 4.0 MB conda-forge 2025-05-07T19:46:06.3487599Z cuda-cuxxfilt-12.8.55 | hbd13f7d_0 211 KB conda-forge 2025-05-07T19:46:06.3488112Z cuda-driver-dev-12.8.57 | h5888daf_1 22 KB conda-forge 2025-05-07T19:46:06.3488680Z cuda-driver-dev_linux-64-12.8.90| h3f2d84a_1 36 KB conda-forge 2025-05-07T19:46:06.3489198Z cuda-gdb-12.8.55 | h50b4baa_0 353 KB conda-forge 2025-05-07T19:46:06.3489712Z cuda-libraries-12.8.0 | ha770c72_0 20 KB conda-forge 2025-05-07T19:46:06.3490236Z cuda-libraries-dev-12.8.0 | ha770c72_0 20 KB conda-forge 2025-05-07T19:46:06.3490781Z cuda-nsight-12.8.55 | h7938cbb_0 113.2 MB conda-forge 2025-05-07T19:46:06.3491299Z cuda-nvcc-12.8.61 | hcdd1206_0 23 KB conda-forge 2025-05-07T19:46:06.3491805Z cuda-nvcc-dev_linux-64-12.8.61| he91c749_1 12.7 MB conda-forge 2025-05-07T19:46:06.3492520Z cuda-nvcc-impl-12.8.61 | h85509e4_1 25 KB conda-forge 2025-05-07T19:46:06.3493039Z cuda-nvcc-tools-12.8.61 | he02047a_1 24.5 MB conda-forge 2025-05-07T19:46:06.3493593Z cuda-nvcc_linux-64-12.8.61 | h04802cd_0 25 KB conda-forge 2025-05-07T19:46:06.3494109Z cuda-nvdisasm-12.8.55 | hbd13f7d_0 4.9 MB conda-forge 2025-05-07T19:46:06.3494646Z cuda-nvml-dev-12.8.55 | hbd13f7d_0 134 KB conda-forge 2025-05-07T19:46:06.3495143Z cuda-nvprof-12.8.57 | hbd13f7d_0 2.5 MB conda-forge 2025-05-07T19:46:06.3495614Z cuda-nvprune-12.8.55 | hbd13f7d_0 68 KB conda-forge 2025-05-07T19:46:06.3496096Z cuda-nvrtc-12.8.61 | hbd13f7d_0 63.1 MB conda-forge 2025-05-07T19:46:06.3496589Z cuda-nvrtc-dev-12.8.61 | h5888daf_0 34 KB conda-forge 2025-05-07T19:46:06.3497197Z cuda-nvtx-12.8.55 | hbd13f7d_0 31 KB conda-forge 2025-05-07T19:46:06.3497757Z cuda-nvvm-dev_linux-64-12.8.61| ha770c72_1 25 KB conda-forge 2025-05-07T19:46:06.3498294Z cuda-nvvm-impl-12.8.61 | he02047a_1 20.8 MB conda-forge 2025-05-07T19:46:06.3498845Z cuda-nvvm-tools-12.8.61 | he02047a_1 23.5 MB conda-forge 2025-05-07T19:46:06.3499345Z cuda-nvvp-12.8.57 | hbd13f7d_0 112.4 MB conda-forge 2025-05-07T19:46:06.3499869Z cuda-opencl-12.8.55 | hbd13f7d_0 29 KB conda-forge 2025-05-07T19:46:06.3500375Z cuda-opencl-dev-12.8.55 | h5888daf_0 95 KB conda-forge 2025-05-07T19:46:06.3500933Z cuda-profiler-api-12.8.55 | h7938cbb_0 22 KB conda-forge 2025-05-07T19:46:06.3501468Z cuda-runtime-12.8.0 | ha804496_0 20 KB conda-forge 2025-05-07T19:46:06.3501985Z cuda-sanitizer-api-12.8.55 | hbd13f7d_0 8.8 MB conda-forge 2025-05-07T19:46:06.3502620Z cuda-toolkit-12.8.0 | ha804496_0 20 KB conda-forge 2025-05-07T19:46:06.3503098Z cuda-tools-12.8.0 | ha770c72_0 19 KB conda-forge 2025-05-07T19:46:06.3503571Z cuda-version-12.8 | h5d125a7_3 21 KB conda-forge 2025-05-07T19:46:06.3504057Z cuda-visual-tools-12.8.0 | ha770c72_0 20 KB conda-forge 2025-05-07T19:46:06.3504563Z cxx-compiler-1.5.2 | hf52228f_0 6 KB conda-forge 2025-05-07T19:46:06.3505011Z dbus-1.13.6 | h5008d03_3 604 KB conda-forge 2025-05-07T19:46:06.3505449Z expat-2.7.0 | h5888daf_0 137 KB conda-forge 2025-05-07T19:46:06.3505862Z gcc-11.4.0 | h602e360_13 49 KB conda-forge 2025-05-07T19:46:06.3506310Z gds-tools-1.13.0.11 | h5888daf_0 37.9 MB conda-forge 2025-05-07T19:46:06.3506756Z gmp-6.3.0 | hac33072_2 449 KB conda-forge 2025-05-07T19:46:06.3507158Z gxx-11.4.0 | h602e360_13 49 KB conda-forge 2025-05-07T19:46:06.3507599Z libcap-2.75 | h39aace5_0 118 KB conda-forge 2025-05-07T19:46:06.3508042Z libcublas-12.8.3.14 | h9ab20c4_0 460.2 MB conda-forge 2025-05-07T19:46:06.3508551Z libcublas-dev-12.8.3.14 | h9ab20c4_0 89 KB conda-forge 2025-05-07T19:46:06.3509022Z libcufft-11.3.3.41 | hbd13f7d_0 147.4 MB conda-forge 2025-05-07T19:46:06.3509519Z libcufft-dev-11.3.3.41 | h5888daf_0 33 KB conda-forge 2025-05-07T19:46:06.3510094Z libcufile-1.13.0.11 | h12f29b5_0 939 KB conda-forge 2025-05-07T19:46:06.3510529Z libcufile-dev-1.13.0.11 | h5888daf_0 35 KB conda-forge 2025-05-07T19:46:06.3510985Z libcurand-10.3.9.55 | hbd13f7d_0 43.6 MB conda-forge 2025-05-07T19:46:06.3511484Z libcurand-dev-10.3.9.55 | h5888daf_0 265 KB conda-forge 2025-05-07T19:46:06.3511949Z libcusolver-11.7.2.55 | h9ab20c4_0 156.9 MB conda-forge 2025-05-07T19:46:06.3512416Z libcusolver-dev-11.7.2.55 | h9ab20c4_0 59 KB conda-forge 2025-05-07T19:46:06.3512869Z libcusparse-12.5.7.53 | hbd13f7d_0 164.9 MB conda-forge 2025-05-07T19:46:06.3513343Z libcusparse-dev-12.5.7.53 | h5888daf_0 51 KB conda-forge 2025-05-07T19:46:06.3513798Z libgcrypt-lib-1.11.0 | hb9d3cd8_2 572 KB conda-forge 2025-05-07T19:46:06.3514236Z libglvnd-1.7.0 | ha4b6fd6_2 129 KB conda-forge 2025-05-07T19:46:06.3514653Z libgpg-error-1.55 | h3f2d84a_0 305 KB conda-forge 2025-05-07T19:46:06.3515074Z libnl-3.11.0 | hb9d3cd8_0 724 KB conda-forge 2025-05-07T19:46:06.3515486Z libnpp-12.3.3.65 | hbd13f7d_0 130.6 MB conda-forge 2025-05-07T19:46:06.3515906Z libnpp-dev-12.3.3.65 | h5888daf_0 443 KB conda-forge 2025-05-07T19:46:06.3516335Z libnuma-2.0.18 | h4ab18f5_2 42 KB conda-forge 2025-05-07T19:46:06.3516751Z libnvfatbin-12.8.55 | hbd13f7d_0 793 KB conda-forge 2025-05-07T19:46:06.3517212Z libnvfatbin-dev-12.8.55 | h5888daf_0 26 KB conda-forge 2025-05-07T19:46:06.3517664Z libnvjitlink-12.8.61 | hbd13f7d_0 28.7 MB conda-forge 2025-05-07T19:46:06.3518133Z libnvjitlink-dev-12.8.61 | h5888daf_0 25 KB conda-forge 2025-05-07T19:46:06.3518593Z libnvjpeg-12.3.5.57 | h97fd463_0 3.0 MB conda-forge 2025-05-07T19:46:06.3519031Z libnvjpeg-dev-12.3.5.57 | ha770c72_0 31 KB conda-forge 2025-05-07T19:46:06.3519516Z libopengl-1.7.0 | ha4b6fd6_2 50 KB conda-forge 2025-05-07T19:46:06.3520035Z libsystemd0-257.4 | h4e0b6ca_1 477 KB conda-forge 2025-05-07T19:46:06.3520466Z libudev1-257.4 | hbe16f8c_1 141 KB conda-forge 2025-05-07T19:46:06.3520905Z libxkbcommon-1.7.0 | h2c5496b_1 579 KB conda-forge 2025-05-07T19:46:06.3521347Z libxkbfile-1.1.0 | h166bdaf_1 111 KB conda-forge 2025-05-07T19:46:06.3521740Z lz4-c-1.10.0 | h5888daf_1 163 KB conda-forge 2025-05-07T19:46:06.3522186Z nsight-compute-2025.1.0.14 | hb5ebaad_0 320.6 MB conda-forge 2025-05-07T19:46:06.3522612Z nspr-4.36 | h5888daf_0 225 KB conda-forge 2025-05-07T19:46:06.3522998Z nss-3.111 | h159eef7_0 1.9 MB conda-forge 2025-05-07T19:46:06.3523393Z ocl-icd-2.3.3 | hb9d3cd8_0 104 KB conda-forge 2025-05-07T19:46:06.3523829Z opencl-headers-2024.10.24 | h5888daf_0 53 KB conda-forge 2025-05-07T19:46:06.3524286Z rdma-core-57.0 | h5888daf_0 1.2 MB conda-forge 2025-05-07T19:46:06.3524686Z wayland-1.23.1 | h3e06ad9_0 314 KB conda-forge 2025-05-07T19:46:06.3525093Z xcb-util-0.4.1 | hb711507_2 19 KB conda-forge 2025-05-07T19:46:06.3525515Z xcb-util-cursor-0.1.5 | hb9d3cd8_0 20 KB conda-forge 2025-05-07T19:46:06.3525975Z xcb-util-image-0.4.0 | hb711507_2 24 KB conda-forge 2025-05-07T19:46:06.3526434Z xcb-util-keysyms-0.4.1 | hb711507_0 14 KB conda-forge 2025-05-07T19:46:06.3526901Z xcb-util-renderutil-0.3.10 | hb711507_0 17 KB conda-forge 2025-05-07T19:46:06.3527358Z xcb-util-wm-0.4.2 | hb711507_0 50 KB conda-forge 2025-05-07T19:46:06.3527793Z xkeyboard-config-2.44 | hb9d3cd8_0 384 KB conda-forge 2025-05-07T19:46:06.3528284Z xorg-libxcomposite-0.4.6 | hb9d3cd8_2 13 KB conda-forge 2025-05-07T19:46:06.3529415Z xorg-libxdamage-1.1.6 | hb9d3cd8_0 13 KB conda-forge 2025-05-07T19:46:06.3529844Z ------------------------------------------------------------ 2025-05-07T19:46:06.3530209Z Total: 1.86 GB 2025-05-07T19:46:06.3530428Z 2025-05-07T19:46:06.3530557Z The following NEW packages will be INSTALLED: 2025-05-07T19:46:06.3530799Z 2025-05-07T19:46:06.3530978Z attr conda-forge/linux-64::attr-2.5.1-h166bdaf_1 2025-05-07T19:46:06.3531405Z binutils conda-forge/linux-64::binutils-2.40-h4852527_7 2025-05-07T19:46:06.3531886Z c-compiler conda-forge/linux-64::c-compiler-1.5.2-h0b41bf4_0 2025-05-07T19:46:06.3532339Z cuda conda-forge/noarch::cuda-12.8.0-ha804496_0 2025-05-07T19:46:06.3532820Z cuda-cccl_linux-64 conda-forge/noarch::cuda-cccl_linux-64-12.8.55-ha770c72_1 2025-05-07T19:46:06.3533457Z cuda-command-line~ conda-forge/linux-64::cuda-command-line-tools-12.8.0-ha770c72_0 2025-05-07T19:46:06.3534057Z cuda-compiler conda-forge/noarch::cuda-compiler-12.8.0-hbad6d8a_0 2025-05-07T19:46:06.3534631Z cuda-crt-dev_linu~ conda-forge/noarch::cuda-crt-dev_linux-64-12.8.61-ha770c72_1 2025-05-07T19:46:06.3535219Z cuda-crt-tools conda-forge/linux-64::cuda-crt-tools-12.8.61-ha770c72_1 2025-05-07T19:46:06.3535746Z cuda-cudart conda-forge/linux-64::cuda-cudart-12.8.57-h5888daf_1 2025-05-07T19:46:06.3536291Z cuda-cudart-dev conda-forge/linux-64::cuda-cudart-dev-12.8.57-h5888daf_1 2025-05-07T19:46:06.3536884Z cuda-cudart-dev_l~ conda-forge/noarch::cuda-cudart-dev_linux-64-12.8.57-h3f2d84a_1 2025-05-07T19:46:06.3537767Z cuda-cudart-static conda-forge/linux-64::cuda-cudart-static-12.8.57-h5888daf_1 2025-05-07T19:46:06.3538436Z cuda-cudart-stati~ conda-forge/noarch::cuda-cudart-static_linux-64-12.8.57-h3f2d84a_1 2025-05-07T19:46:06.3539081Z cuda-cudart_linux~ conda-forge/noarch::cuda-cudart_linux-64-12.8.57-h3f2d84a_1 2025-05-07T19:46:06.3539771Z cuda-cuobjdump conda-forge/linux-64::cuda-cuobjdump-12.8.55-hbd13f7d_0 2025-05-07T19:46:06.3540315Z cuda-cupti conda-forge/linux-64::cuda-cupti-12.8.57-hbd13f7d_0 2025-05-07T19:46:06.3540856Z cuda-cupti-dev conda-forge/linux-64::cuda-cupti-dev-12.8.57-h5888daf_0 2025-05-07T19:46:06.3541433Z cuda-cuxxfilt conda-forge/linux-64::cuda-cuxxfilt-12.8.55-hbd13f7d_0 2025-05-07T19:46:06.3542001Z cuda-driver-dev conda-forge/linux-64::cuda-driver-dev-12.8.57-h5888daf_1 2025-05-07T19:46:06.3542628Z cuda-driver-dev_l~ conda-forge/noarch::cuda-driver-dev_linux-64-12.8.90-h3f2d84a_1 2025-05-07T19:46:06.3543185Z cuda-gdb conda-forge/linux-64::cuda-gdb-12.8.55-h50b4baa_0 2025-05-07T19:46:06.3543716Z cuda-libraries conda-forge/linux-64::cuda-libraries-12.8.0-ha770c72_0 2025-05-07T19:46:06.3544325Z cuda-libraries-dev conda-forge/linux-64::cuda-libraries-dev-12.8.0-ha770c72_0 2025-05-07T19:46:06.3544912Z cuda-nsight conda-forge/linux-64::cuda-nsight-12.8.55-h7938cbb_0 2025-05-07T19:46:06.3545439Z cuda-nvcc conda-forge/linux-64::cuda-nvcc-12.8.61-hcdd1206_0 2025-05-07T19:46:06.3545994Z cuda-nvcc-dev_lin~ conda-forge/noarch::cuda-nvcc-dev_linux-64-12.8.61-he91c749_1 2025-05-07T19:46:06.3546608Z cuda-nvcc-impl conda-forge/linux-64::cuda-nvcc-impl-12.8.61-h85509e4_1 2025-05-07T19:46:06.3547197Z cuda-nvcc-tools conda-forge/linux-64::cuda-nvcc-tools-12.8.61-he02047a_1 2025-05-07T19:46:06.3547781Z cuda-nvcc_linux-64 conda-forge/linux-64::cuda-nvcc_linux-64-12.8.61-h04802cd_0 2025-05-07T19:46:06.3548372Z cuda-nvdisasm conda-forge/linux-64::cuda-nvdisasm-12.8.55-hbd13f7d_0 2025-05-07T19:46:06.3548930Z cuda-nvml-dev conda-forge/linux-64::cuda-nvml-dev-12.8.55-hbd13f7d_0 2025-05-07T19:46:06.3549517Z cuda-nvprof conda-forge/linux-64::cuda-nvprof-12.8.57-hbd13f7d_0 2025-05-07T19:46:06.3550217Z cuda-nvprune conda-forge/linux-64::cuda-nvprune-12.8.55-hbd13f7d_0 2025-05-07T19:46:06.3550820Z cuda-nvrtc conda-forge/linux-64::cuda-nvrtc-12.8.61-hbd13f7d_0 2025-05-07T19:46:06.3551394Z cuda-nvrtc-dev conda-forge/linux-64::cuda-nvrtc-dev-12.8.61-h5888daf_0 2025-05-07T19:46:06.3551935Z cuda-nvtx conda-forge/linux-64::cuda-nvtx-12.8.55-hbd13f7d_0 2025-05-07T19:46:06.3552579Z cuda-nvvm-dev_lin~ conda-forge/noarch::cuda-nvvm-dev_linux-64-12.8.61-ha770c72_1 2025-05-07T19:46:06.3553146Z cuda-nvvm-impl conda-forge/linux-64::cuda-nvvm-impl-12.8.61-he02047a_1 2025-05-07T19:46:06.3553680Z cuda-nvvm-tools conda-forge/linux-64::cuda-nvvm-tools-12.8.61-he02047a_1 2025-05-07T19:46:06.3554196Z cuda-nvvp conda-forge/linux-64::cuda-nvvp-12.8.57-hbd13f7d_0 2025-05-07T19:46:06.3554669Z cuda-opencl conda-forge/linux-64::cuda-opencl-12.8.55-hbd13f7d_0 2025-05-07T19:46:06.3555197Z cuda-opencl-dev conda-forge/linux-64::cuda-opencl-dev-12.8.55-h5888daf_0 2025-05-07T19:46:06.3555778Z cuda-profiler-api conda-forge/linux-64::cuda-profiler-api-12.8.55-h7938cbb_0 2025-05-07T19:46:06.3556316Z cuda-runtime conda-forge/noarch::cuda-runtime-12.8.0-ha804496_0 2025-05-07T19:46:06.3556875Z cuda-sanitizer-api conda-forge/linux-64::cuda-sanitizer-api-12.8.55-hbd13f7d_0 2025-05-07T19:46:06.3557417Z cuda-toolkit conda-forge/noarch::cuda-toolkit-12.8.0-ha804496_0 2025-05-07T19:46:06.3557899Z cuda-tools conda-forge/linux-64::cuda-tools-12.8.0-ha770c72_0 2025-05-07T19:46:06.3558379Z cuda-version conda-forge/noarch::cuda-version-12.8-h5d125a7_3 2025-05-07T19:46:06.3558900Z cuda-visual-tools conda-forge/linux-64::cuda-visual-tools-12.8.0-ha770c72_0 2025-05-07T19:46:06.3559452Z cxx-compiler conda-forge/linux-64::cxx-compiler-1.5.2-hf52228f_0 2025-05-07T19:46:06.3559893Z dbus conda-forge/linux-64::dbus-1.13.6-h5008d03_3 2025-05-07T19:46:06.3560342Z expat conda-forge/linux-64::expat-2.7.0-h5888daf_0 2025-05-07T19:46:06.3560850Z gcc conda-forge/linux-64::gcc-11.4.0-h602e360_13 2025-05-07T19:46:06.3561295Z gds-tools conda-forge/linux-64::gds-tools-1.13.0.11-h5888daf_0 2025-05-07T19:46:06.3561731Z gmp conda-forge/linux-64::gmp-6.3.0-hac33072_2 2025-05-07T19:46:06.3562123Z gxx conda-forge/linux-64::gxx-11.4.0-h602e360_13 2025-05-07T19:46:06.3562575Z libcap conda-forge/linux-64::libcap-2.75-h39aace5_0 2025-05-07T19:46:06.3563250Z libcublas conda-forge/linux-64::libcublas-12.8.3.14-h9ab20c4_0 2025-05-07T19:46:06.3563812Z libcublas-dev conda-forge/linux-64::libcublas-dev-12.8.3.14-h9ab20c4_0 2025-05-07T19:46:06.3564400Z libcufft conda-forge/linux-64::libcufft-11.3.3.41-hbd13f7d_0 2025-05-07T19:46:06.3564930Z libcufft-dev conda-forge/linux-64::libcufft-dev-11.3.3.41-h5888daf_0 2025-05-07T19:46:06.3565500Z libcufile conda-forge/linux-64::libcufile-1.13.0.11-h12f29b5_0 2025-05-07T19:46:06.3566240Z libcufile-dev conda-forge/linux-64::libcufile-dev-1.13.0.11-h5888daf_0 2025-05-07T19:46:06.3566796Z libcurand conda-forge/linux-64::libcurand-10.3.9.55-hbd13f7d_0 2025-05-07T19:46:06.3567539Z libcurand-dev conda-forge/linux-64::libcurand-dev-10.3.9.55-h5888daf_0 2025-05-07T19:46:06.3568102Z libcusolver conda-forge/linux-64::libcusolver-11.7.2.55-h9ab20c4_0 2025-05-07T19:46:06.3568691Z libcusolver-dev conda-forge/linux-64::libcusolver-dev-11.7.2.55-h9ab20c4_0 2025-05-07T19:46:06.3569276Z libcusparse conda-forge/linux-64::libcusparse-12.5.7.53-hbd13f7d_0 2025-05-07T19:46:06.3569849Z libcusparse-dev conda-forge/linux-64::libcusparse-dev-12.5.7.53-h5888daf_0 2025-05-07T19:46:06.3570439Z libgcrypt-lib conda-forge/linux-64::libgcrypt-lib-1.11.0-hb9d3cd8_2 2025-05-07T19:46:06.3570951Z libglvnd conda-forge/linux-64::libglvnd-1.7.0-ha4b6fd6_2 2025-05-07T19:46:06.3571461Z libgpg-error conda-forge/linux-64::libgpg-error-1.55-h3f2d84a_0 2025-05-07T19:46:06.3572080Z libnl conda-forge/linux-64::libnl-3.11.0-hb9d3cd8_0 2025-05-07T19:46:06.3572527Z libnpp conda-forge/linux-64::libnpp-12.3.3.65-hbd13f7d_0 2025-05-07T19:46:06.3573030Z libnpp-dev conda-forge/linux-64::libnpp-dev-12.3.3.65-h5888daf_0 2025-05-07T19:46:06.3573521Z libnuma conda-forge/linux-64::libnuma-2.0.18-h4ab18f5_2 2025-05-07T19:46:06.3574032Z libnvfatbin conda-forge/linux-64::libnvfatbin-12.8.55-hbd13f7d_0 2025-05-07T19:46:06.3574643Z libnvfatbin-dev conda-forge/linux-64::libnvfatbin-dev-12.8.55-h5888daf_0 2025-05-07T19:46:06.3575241Z libnvjitlink conda-forge/linux-64::libnvjitlink-12.8.61-hbd13f7d_0 2025-05-07T19:46:06.3575875Z libnvjitlink-dev conda-forge/linux-64::libnvjitlink-dev-12.8.61-h5888daf_0 2025-05-07T19:46:06.3576460Z libnvjpeg conda-forge/linux-64::libnvjpeg-12.3.5.57-h97fd463_0 2025-05-07T19:46:06.3577112Z libnvjpeg-dev conda-forge/linux-64::libnvjpeg-dev-12.3.5.57-ha770c72_0 2025-05-07T19:46:06.3577722Z libopengl conda-forge/linux-64::libopengl-1.7.0-ha4b6fd6_2 2025-05-07T19:46:06.3578254Z libsystemd0 conda-forge/linux-64::libsystemd0-257.4-h4e0b6ca_1 2025-05-07T19:46:06.3578817Z libudev1 conda-forge/linux-64::libudev1-257.4-hbe16f8c_1 2025-05-07T19:46:06.3579353Z libxkbcommon conda-forge/linux-64::libxkbcommon-1.7.0-h2c5496b_1 2025-05-07T19:46:06.3579934Z libxkbfile conda-forge/linux-64::libxkbfile-1.1.0-h166bdaf_1 2025-05-07T19:46:06.3580456Z lz4-c conda-forge/linux-64::lz4-c-1.10.0-h5888daf_1 2025-05-07T19:46:06.3581010Z nsight-compute conda-forge/linux-64::nsight-compute-2025.1.0.14-hb5ebaad_0 2025-05-07T19:46:06.3581591Z nspr conda-forge/linux-64::nspr-4.36-h5888daf_0 2025-05-07T19:46:06.3582019Z nss conda-forge/linux-64::nss-3.111-h159eef7_0 2025-05-07T19:46:06.3582488Z ocl-icd conda-forge/linux-64::ocl-icd-2.3.3-hb9d3cd8_0 2025-05-07T19:46:06.3583155Z opencl-headers conda-forge/linux-64::opencl-headers-2024.10.24-h5888daf_0 2025-05-07T19:46:06.3583702Z rdma-core conda-forge/linux-64::rdma-core-57.0-h5888daf_0 2025-05-07T19:46:06.3584184Z wayland conda-forge/linux-64::wayland-1.23.1-h3e06ad9_0 2025-05-07T19:46:06.3584636Z xcb-util conda-forge/linux-64::xcb-util-0.4.1-hb711507_2 2025-05-07T19:46:06.3585165Z xcb-util-cursor conda-forge/linux-64::xcb-util-cursor-0.1.5-hb9d3cd8_0 2025-05-07T19:46:06.3585724Z xcb-util-image conda-forge/linux-64::xcb-util-image-0.4.0-hb711507_2 2025-05-07T19:46:06.3586309Z xcb-util-keysyms conda-forge/linux-64::xcb-util-keysyms-0.4.1-hb711507_0 2025-05-07T19:46:06.3586935Z xcb-util-renderut~ conda-forge/linux-64::xcb-util-renderutil-0.3.10-hb711507_0 2025-05-07T19:46:06.3587500Z xcb-util-wm conda-forge/linux-64::xcb-util-wm-0.4.2-hb711507_0 2025-05-07T19:46:06.3588054Z xkeyboard-config conda-forge/linux-64::xkeyboard-config-2.44-hb9d3cd8_0 2025-05-07T19:46:06.3588678Z xorg-libxcomposite conda-forge/linux-64::xorg-libxcomposite-0.4.6-hb9d3cd8_2 2025-05-07T19:46:06.3589295Z xorg-libxdamage conda-forge/linux-64::xorg-libxdamage-1.1.6-hb9d3cd8_0 2025-05-07T19:46:06.3589641Z 2025-05-07T19:46:06.3589673Z 2025-05-07T19:46:06.3589677Z 2025-05-07T19:46:06.3589825Z Downloading and Extracting Packages: ...working... 2025-05-07T19:46:06.3590218Z libcublas-12.8.3.14 | 460.2 MB | | 0% 2025-05-07T19:46:06.3590479Z 2025-05-07T19:46:06.3590911Z nsight-compute-2025. | 320.6 MB | | 0%  2025-05-07T19:46:06.3591178Z 2025-05-07T19:46:06.3591182Z 2025-05-07T19:46:06.3601178Z libcusparse-12.5.7.5 | 164.9 MB | | 0%  2025-05-07T19:46:06.3601485Z 2025-05-07T19:46:06.3601490Z 2025-05-07T19:46:06.3601494Z 2025-05-07T19:46:06.3619027Z libcusolver-11.7.2.5 | 156.9 MB | | 0%  2025-05-07T19:46:06.3619355Z 2025-05-07T19:46:06.3619368Z 2025-05-07T19:46:06.3619372Z 2025-05-07T19:46:06.3619508Z 2025-05-07T19:46:06.3627959Z libcufft-11.3.3.41 | 147.4 MB | | 0%  2025-05-07T19:46:06.3628275Z 2025-05-07T19:46:06.3628278Z 2025-05-07T19:46:06.3628282Z 2025-05-07T19:46:06.3628285Z 2025-05-07T19:46:06.3628288Z 2025-05-07T19:46:06.3628534Z libnpp-12.3.3.65 | 130.6 MB | | 0%  2025-05-07T19:46:06.3628813Z 2025-05-07T19:46:06.3628817Z 2025-05-07T19:46:06.3628821Z 2025-05-07T19:46:06.3628838Z 2025-05-07T19:46:06.3628842Z 2025-05-07T19:46:06.3628845Z 2025-05-07T19:46:06.3629358Z cuda-nsight-12.8.55 | 113.2 MB | | 0%  2025-05-07T19:46:06.3629667Z 2025-05-07T19:46:06.3629670Z 2025-05-07T19:46:06.3629676Z 2025-05-07T19:46:06.3629679Z 2025-05-07T19:46:06.3629685Z 2025-05-07T19:46:06.3629702Z 2025-05-07T19:46:06.3629706Z 2025-05-07T19:46:06.3629956Z cuda-nvvp-12.8.57 | 112.4 MB | | 0%  2025-05-07T19:46:06.3630252Z 2025-05-07T19:46:06.3630255Z 2025-05-07T19:46:06.3630264Z 2025-05-07T19:46:06.3630267Z 2025-05-07T19:46:06.3630275Z 2025-05-07T19:46:06.3630279Z 2025-05-07T19:46:06.3630282Z 2025-05-07T19:46:06.3630299Z 2025-05-07T19:46:06.3632504Z cuda-nvrtc-12.8.61 | 63.1 MB | | 0%  2025-05-07T19:46:06.3632877Z 2025-05-07T19:46:06.3632883Z 2025-05-07T19:46:06.3632888Z 2025-05-07T19:46:06.3632894Z 2025-05-07T19:46:06.3632900Z 2025-05-07T19:46:06.3632905Z 2025-05-07T19:46:06.3632910Z 2025-05-07T19:46:06.3632918Z 2025-05-07T19:46:06.3632926Z 2025-05-07T19:46:06.3633266Z libcurand-10.3.9.55 | 43.6 MB | | 0%  2025-05-07T19:46:06.3633583Z 2025-05-07T19:46:06.3633591Z 2025-05-07T19:46:06.3633597Z 2025-05-07T19:46:06.3633601Z 2025-05-07T19:46:06.3633606Z 2025-05-07T19:46:06.3633610Z 2025-05-07T19:46:06.3633615Z 2025-05-07T19:46:06.3633619Z 2025-05-07T19:46:06.3633622Z 2025-05-07T19:46:06.3633626Z 2025-05-07T19:46:06.3633928Z gds-tools-1.13.0.11 | 37.9 MB | | 0%  2025-05-07T19:46:06.3634550Z 2025-05-07T19:46:06.3634558Z 2025-05-07T19:46:06.3634564Z 2025-05-07T19:46:06.3634568Z 2025-05-07T19:46:06.3634571Z 2025-05-07T19:46:06.3634575Z 2025-05-07T19:46:06.3634579Z 2025-05-07T19:46:06.3634582Z 2025-05-07T19:46:06.3634586Z 2025-05-07T19:46:06.3634589Z 2025-05-07T19:46:06.3639457Z 2025-05-07T19:46:06.3640630Z libnvjitlink-12.8.61 | 28.7 MB | | 0%  2025-05-07T19:46:06.3641006Z 2025-05-07T19:46:06.3641013Z 2025-05-07T19:46:06.3641021Z 2025-05-07T19:46:06.3641026Z 2025-05-07T19:46:06.3641034Z 2025-05-07T19:46:06.3641059Z 2025-05-07T19:46:06.3641063Z 2025-05-07T19:46:06.3641066Z 2025-05-07T19:46:06.3641069Z 2025-05-07T19:46:06.3641075Z 2025-05-07T19:46:06.3641078Z 2025-05-07T19:46:06.3641096Z 2025-05-07T19:46:06.3641403Z cuda-nvcc-tools-12.8 | 24.5 MB | | 0%  2025-05-07T19:46:06.3641734Z 2025-05-07T19:46:06.3641756Z 2025-05-07T19:46:06.3641814Z 2025-05-07T19:46:06.3641856Z 2025-05-07T19:46:06.3641860Z 2025-05-07T19:46:06.3641864Z 2025-05-07T19:46:06.3641867Z 2025-05-07T19:46:06.3641871Z 2025-05-07T19:46:06.3641874Z 2025-05-07T19:46:06.3641877Z 2025-05-07T19:46:06.3641881Z 2025-05-07T19:46:06.3641884Z 2025-05-07T19:46:06.3641891Z 2025-05-07T19:46:06.3642382Z cuda-nvvm-tools-12.8 | 23.5 MB | | 0%  2025-05-07T19:46:06.3642726Z 2025-05-07T19:46:06.3642729Z 2025-05-07T19:46:06.3642733Z 2025-05-07T19:46:06.3642736Z 2025-05-07T19:46:06.3642739Z 2025-05-07T19:46:06.3642743Z 2025-05-07T19:46:06.3642746Z 2025-05-07T19:46:06.3642749Z 2025-05-07T19:46:06.3642758Z 2025-05-07T19:46:06.3642761Z 2025-05-07T19:46:06.3642764Z 2025-05-07T19:46:06.3642768Z 2025-05-07T19:46:06.3642771Z 2025-05-07T19:46:06.3642774Z 2025-05-07T19:46:06.3645437Z cuda-nvvm-impl-12.8. | 20.8 MB | | 0%  2025-05-07T19:46:06.3646907Z 2025-05-07T19:46:06.3646923Z 2025-05-07T19:46:06.3646968Z 2025-05-07T19:46:06.3646980Z 2025-05-07T19:46:06.3646991Z 2025-05-07T19:46:06.3647035Z 2025-05-07T19:46:06.3647046Z 2025-05-07T19:46:06.3647056Z 2025-05-07T19:46:06.3647067Z 2025-05-07T19:46:06.3647077Z 2025-05-07T19:46:06.3647088Z 2025-05-07T19:46:06.3647098Z 2025-05-07T19:46:06.3647109Z 2025-05-07T19:46:06.3647119Z 2025-05-07T19:46:06.3647315Z 2025-05-07T19:46:06.3647656Z cuda-nvcc-dev_linux- | 12.7 MB | | 0%  2025-05-07T19:46:06.3648045Z 2025-05-07T19:46:06.3648050Z 2025-05-07T19:46:06.3648053Z 2025-05-07T19:46:06.3648057Z 2025-05-07T19:46:06.3648060Z 2025-05-07T19:46:06.3648064Z 2025-05-07T19:46:06.3648067Z 2025-05-07T19:46:06.3648071Z 2025-05-07T19:46:06.3648074Z 2025-05-07T19:46:06.3648078Z 2025-05-07T19:46:06.3648081Z 2025-05-07T19:46:06.3648085Z 2025-05-07T19:46:06.3648088Z 2025-05-07T19:46:06.3648091Z 2025-05-07T19:46:06.3648095Z 2025-05-07T19:46:06.3648103Z 2025-05-07T19:46:06.3648481Z cuda-sanitizer-api-1 | 8.8 MB | | 0%  2025-05-07T19:46:06.3648837Z 2025-05-07T19:46:06.3648840Z 2025-05-07T19:46:06.3648844Z 2025-05-07T19:46:06.3648848Z 2025-05-07T19:46:06.3648852Z 2025-05-07T19:46:06.3648855Z 2025-05-07T19:46:06.3648858Z 2025-05-07T19:46:06.3648862Z 2025-05-07T19:46:06.3648865Z 2025-05-07T19:46:06.3648868Z 2025-05-07T19:46:06.3648871Z 2025-05-07T19:46:06.3648875Z 2025-05-07T19:46:06.3648878Z 2025-05-07T19:46:06.3648908Z 2025-05-07T19:46:06.3648911Z 2025-05-07T19:46:06.3648914Z 2025-05-07T19:46:06.3648918Z 2025-05-07T19:46:06.3649251Z cuda-nvdisasm-12.8.5 | 4.9 MB | | 0%  2025-05-07T19:46:06.3649601Z 2025-05-07T19:46:06.3649605Z 2025-05-07T19:46:06.3649608Z 2025-05-07T19:46:06.3649612Z 2025-05-07T19:46:06.3649615Z 2025-05-07T19:46:06.3649648Z 2025-05-07T19:46:06.3649652Z 2025-05-07T19:46:06.3649656Z 2025-05-07T19:46:06.3649664Z 2025-05-07T19:46:06.3649783Z 2025-05-07T19:46:06.3649788Z 2025-05-07T19:46:06.3649791Z 2025-05-07T19:46:06.3649795Z 2025-05-07T19:46:06.3649798Z 2025-05-07T19:46:06.3649801Z 2025-05-07T19:46:06.3649805Z 2025-05-07T19:46:06.3649808Z 2025-05-07T19:46:06.3649811Z 2025-05-07T19:46:06.3650173Z cuda-cupti-dev-12.8. | 4.0 MB | | 0%  2025-05-07T19:46:06.3650550Z 2025-05-07T19:46:06.3650553Z 2025-05-07T19:46:06.3650556Z 2025-05-07T19:46:06.3650560Z 2025-05-07T19:46:06.3650563Z 2025-05-07T19:46:06.3650567Z 2025-05-07T19:46:06.3650571Z 2025-05-07T19:46:06.3650574Z 2025-05-07T19:46:06.3650578Z 2025-05-07T19:46:06.3650581Z 2025-05-07T19:46:06.3650584Z 2025-05-07T19:46:06.3650588Z 2025-05-07T19:46:06.3650591Z 2025-05-07T19:46:06.3650594Z 2025-05-07T19:46:06.3650598Z 2025-05-07T19:46:06.3650601Z 2025-05-07T19:46:06.3650605Z 2025-05-07T19:46:06.3650608Z 2025-05-07T19:46:06.3650642Z 2025-05-07T19:46:06.4573166Z ... (more hidden) ... 2025-05-07T19:46:06.4578884Z libcublas-12.8.3.14 | 460.2 MB | | 1% 2025-05-07T19:46:06.4579203Z 2025-05-07T19:46:06.4586091Z nsight-compute-2025. | 320.6 MB | | 1%  2025-05-07T19:46:06.4586388Z 2025-05-07T19:46:06.4587204Z 2025-05-07T19:46:06.4604789Z libcusparse-12.5.7.5 | 164.9 MB | | 1%  2025-05-07T19:46:06.4605697Z 2025-05-07T19:46:06.4605713Z 2025-05-07T19:46:06.4605725Z 2025-05-07T19:46:06.4615411Z libcusolver-11.7.2.5 | 156.9 MB | 1 | 1%  2025-05-07T19:46:06.4616320Z 2025-05-07T19:46:06.4616334Z 2025-05-07T19:46:06.4616475Z 2025-05-07T19:46:06.4616479Z 2025-05-07T19:46:06.5575381Z libcufft-11.3.3.41 | 147.4 MB | 1 | 1%  2025-05-07T19:46:06.5579452Z libcublas-12.8.3.14 | 460.2 MB | 2 | 2% 2025-05-07T19:46:06.5581248Z 2025-05-07T19:46:06.5587885Z nsight-compute-2025. | 320.6 MB | 2 | 2%  2025-05-07T19:46:06.5588405Z 2025-05-07T19:46:06.5588429Z 2025-05-07T19:46:06.5604658Z libcusparse-12.5.7.5 | 164.9 MB | 3 | 4%  2025-05-07T19:46:06.5605581Z 2025-05-07T19:46:06.5605595Z 2025-05-07T19:46:06.5605606Z 2025-05-07T19:46:06.5617128Z libcusolver-11.7.2.5 | 156.9 MB | 5 | 5%  2025-05-07T19:46:06.5617504Z 2025-05-07T19:46:06.5617509Z 2025-05-07T19:46:06.5617514Z 2025-05-07T19:46:06.5617518Z 2025-05-07T19:46:06.6575943Z libcufft-11.3.3.41 | 147.4 MB | 6 | 6%  2025-05-07T19:46:06.6581218Z libcublas-12.8.3.14 | 460.2 MB | 3 | 3% 2025-05-07T19:46:06.6581496Z 2025-05-07T19:46:06.6586352Z nsight-compute-2025. | 320.6 MB | 4 | 4%  2025-05-07T19:46:06.6586661Z 2025-05-07T19:46:06.6586666Z 2025-05-07T19:46:06.6607544Z libcusparse-12.5.7.5 | 164.9 MB | 6 | 7%  2025-05-07T19:46:06.6608428Z 2025-05-07T19:46:06.6608442Z 2025-05-07T19:46:06.6608481Z 2025-05-07T19:46:06.6617951Z libcusolver-11.7.2.5 | 156.9 MB | 9 | 9%  2025-05-07T19:46:06.6618821Z 2025-05-07T19:46:06.6618833Z 2025-05-07T19:46:06.6618843Z 2025-05-07T19:46:06.6619040Z 2025-05-07T19:46:06.7575513Z libcufft-11.3.3.41 | 147.4 MB | # | 11%  2025-05-07T19:46:06.7583001Z libcublas-12.8.3.14 | 460.2 MB | 4 | 5% 2025-05-07T19:46:06.7584244Z 2025-05-07T19:46:06.7592778Z nsight-compute-2025. | 320.6 MB | 6 | 6%  2025-05-07T19:46:06.7593077Z 2025-05-07T19:46:06.7593081Z 2025-05-07T19:46:06.7609078Z libcusparse-12.5.7.5 | 164.9 MB | 9 | 10%  2025-05-07T19:46:06.7610004Z 2025-05-07T19:46:06.7610018Z 2025-05-07T19:46:06.7610030Z 2025-05-07T19:46:06.7651507Z libcusolver-11.7.2.5 | 156.9 MB | #3 | 14%  2025-05-07T19:46:06.7651859Z 2025-05-07T19:46:06.7651864Z 2025-05-07T19:46:06.7651867Z 2025-05-07T19:46:06.7651871Z 2025-05-07T19:46:06.8576560Z libcufft-11.3.3.41 | 147.4 MB | #4 | 14%  2025-05-07T19:46:06.8584807Z libcublas-12.8.3.14 | 460.2 MB | 6 | 6% 2025-05-07T19:46:06.8585090Z 2025-05-07T19:46:06.8595377Z nsight-compute-2025. | 320.6 MB | 8 | 8%  2025-05-07T19:46:06.8595693Z 2025-05-07T19:46:06.8595700Z 2025-05-07T19:46:06.8610111Z libcusparse-12.5.7.5 | 164.9 MB | #2 | 13%  2025-05-07T19:46:06.8610434Z 2025-05-07T19:46:06.8610439Z 2025-05-07T19:46:06.8610609Z 2025-05-07T19:46:06.8653768Z libcusolver-11.7.2.5 | 156.9 MB | #7 | 18%  2025-05-07T19:46:06.8654127Z 2025-05-07T19:46:06.8654134Z 2025-05-07T19:46:06.8654140Z 2025-05-07T19:46:06.8654146Z 2025-05-07T19:46:06.9588517Z libcufft-11.3.3.41 | 147.4 MB | #8 | 19%  2025-05-07T19:46:06.9589429Z 2025-05-07T19:46:06.9593439Z nsight-compute-2025. | 320.6 MB | # | 10%  2025-05-07T19:46:06.9594276Z 2025-05-07T19:46:06.9594817Z 2025-05-07T19:46:06.9607870Z libcusparse-12.5.7.5 | 164.9 MB | #5 | 16%  2025-05-07T19:46:06.9608755Z 2025-05-07T19:46:06.9608767Z 2025-05-07T19:46:06.9608795Z 2025-05-07T19:46:06.9616974Z libcusolver-11.7.2.5 | 156.9 MB | ##1 | 22%  2025-05-07T19:46:06.9656737Z libcublas-12.8.3.14 | 460.2 MB | 7 | 8% 2025-05-07T19:46:06.9657097Z 2025-05-07T19:46:06.9657107Z 2025-05-07T19:46:06.9657113Z 2025-05-07T19:46:06.9657120Z 2025-05-07T19:46:07.0602211Z libcufft-11.3.3.41 | 147.4 MB | ##2 | 23%  2025-05-07T19:46:07.0602574Z 2025-05-07T19:46:07.0602580Z 2025-05-07T19:46:07.0608372Z libcusparse-12.5.7.5 | 164.9 MB | #8 | 19%  2025-05-07T19:46:07.0608661Z 2025-05-07T19:46:07.0608667Z 2025-05-07T19:46:07.0608675Z 2025-05-07T19:46:07.0619868Z libcusolver-11.7.2.5 | 156.9 MB | ##6 | 26%  2025-05-07T19:46:07.0656448Z libcublas-12.8.3.14 | 460.2 MB | 9 | 9% 2025-05-07T19:46:07.0656804Z 2025-05-07T19:46:07.0656887Z 2025-05-07T19:46:07.0656894Z 2025-05-07T19:46:07.0656912Z 2025-05-07T19:46:07.1603079Z libcufft-11.3.3.41 | 147.4 MB | ##7 | 28%  2025-05-07T19:46:07.1603417Z 2025-05-07T19:46:07.1603422Z 2025-05-07T19:46:07.1622513Z libcusparse-12.5.7.5 | 164.9 MB | ##3 | 23%  2025-05-07T19:46:07.1657935Z libcublas-12.8.3.14 | 460.2 MB | # | 11% 2025-05-07T19:46:07.1658781Z 2025-05-07T19:46:07.1658795Z 2025-05-07T19:46:07.1658806Z 2025-05-07T19:46:07.1658817Z 2025-05-07T19:46:07.1948779Z libcufft-11.3.3.41 | 147.4 MB | ###3 | 34%  2025-05-07T19:46:07.1949107Z 2025-05-07T19:46:07.1949114Z 2025-05-07T19:46:07.1949118Z 2025-05-07T19:46:07.1973302Z libcusolver-11.7.2.5 | 156.9 MB | ### | 30%  2025-05-07T19:46:07.1974107Z 2025-05-07T19:46:07.2603410Z nsight-compute-2025. | 320.6 MB | #1 | 12%  2025-05-07T19:46:07.2603723Z 2025-05-07T19:46:07.2603729Z 2025-05-07T19:46:07.2622520Z libcusparse-12.5.7.5 | 164.9 MB | ##7 | 27%  2025-05-07T19:46:07.2951896Z libcublas-12.8.3.14 | 460.2 MB | #2 | 12% 2025-05-07T19:46:07.2952466Z 2025-05-07T19:46:07.2952529Z 2025-05-07T19:46:07.2952536Z 2025-05-07T19:46:07.2972570Z libcusolver-11.7.2.5 | 156.9 MB | ###4 | 35%  2025-05-07T19:46:07.2972912Z 2025-05-07T19:46:07.3605447Z nsight-compute-2025. | 320.6 MB | #4 | 14%  2025-05-07T19:46:07.3605757Z 2025-05-07T19:46:07.3605764Z 2025-05-07T19:46:07.3623718Z libcusparse-12.5.7.5 | 164.9 MB | ###1 | 31%  2025-05-07T19:46:07.3952635Z libcublas-12.8.3.14 | 460.2 MB | #4 | 14% 2025-05-07T19:46:07.3952931Z 2025-05-07T19:46:07.3953112Z 2025-05-07T19:46:07.3953123Z 2025-05-07T19:46:07.3974502Z libcusolver-11.7.2.5 | 156.9 MB | ###9 | 39%  2025-05-07T19:46:07.3974838Z 2025-05-07T19:46:07.4535493Z nsight-compute-2025. | 320.6 MB | #6 | 17%  2025-05-07T19:46:07.4535808Z 2025-05-07T19:46:07.4535812Z 2025-05-07T19:46:07.4535815Z 2025-05-07T19:46:07.4535819Z 2025-05-07T19:46:07.4701792Z libcufft-11.3.3.41 | 147.4 MB | ###8 | 38%  2025-05-07T19:46:07.4702360Z 2025-05-07T19:46:07.4702378Z 2025-05-07T19:46:07.4765468Z libcusparse-12.5.7.5 | 164.9 MB | ###4 | 35%  2025-05-07T19:46:07.4973488Z libcublas-12.8.3.14 | 460.2 MB | #5 | 16% 2025-05-07T19:46:07.4974301Z 2025-05-07T19:46:07.4974315Z 2025-05-07T19:46:07.4974326Z 2025-05-07T19:46:07.4976021Z libcusolver-11.7.2.5 | 156.9 MB | ####3 | 43%  2025-05-07T19:46:07.4976866Z 2025-05-07T19:46:07.5539261Z nsight-compute-2025. | 320.6 MB | #8 | 19%  2025-05-07T19:46:07.5540146Z 2025-05-07T19:46:07.5540160Z 2025-05-07T19:46:07.5540171Z 2025-05-07T19:46:07.5540182Z 2025-05-07T19:46:07.5779744Z libcufft-11.3.3.41 | 147.4 MB | ####2 | 42%  2025-05-07T19:46:07.5978487Z libcublas-12.8.3.14 | 460.2 MB | #7 | 17% 2025-05-07T19:46:07.5979210Z 2025-05-07T19:46:07.6242880Z nsight-compute-2025. | 320.6 MB | ##1 | 22%  2025-05-07T19:46:07.6243209Z 2025-05-07T19:46:07.6243387Z 2025-05-07T19:46:07.6243401Z 2025-05-07T19:46:07.6541201Z libcusolver-11.7.2.5 | 156.9 MB | ####7 | 48%  2025-05-07T19:46:07.6541538Z 2025-05-07T19:46:07.6541543Z 2025-05-07T19:46:07.6541549Z 2025-05-07T19:46:07.6541555Z 2025-05-07T19:46:07.6715874Z libcufft-11.3.3.41 | 147.4 MB | ####7 | 47%  2025-05-07T19:46:07.6716187Z 2025-05-07T19:46:07.6716321Z 2025-05-07T19:46:07.6980002Z libcusparse-12.5.7.5 | 164.9 MB | ###8 | 38%  2025-05-07T19:46:07.6980326Z 2025-05-07T19:46:07.7074039Z nsight-compute-2025. | 320.6 MB | ##4 | 25%  2025-05-07T19:46:07.7245346Z libcublas-12.8.3.14 | 460.2 MB | #8 | 19% 2025-05-07T19:46:07.7245852Z 2025-05-07T19:46:07.7245926Z 2025-05-07T19:46:07.7245931Z 2025-05-07T19:46:07.7724871Z libcusolver-11.7.2.5 | 156.9 MB | #####2 | 52%  2025-05-07T19:46:07.7725200Z 2025-05-07T19:46:07.7725602Z 2025-05-07T19:46:07.8061730Z libcusparse-12.5.7.5 | 164.9 MB | ####1 | 42%  2025-05-07T19:46:07.8062325Z 2025-05-07T19:46:07.8062350Z 2025-05-07T19:46:07.8062354Z 2025-05-07T19:46:07.8062357Z 2025-05-07T19:46:07.8341308Z libcufft-11.3.3.41 | 147.4 MB | #####1 | 51%  2025-05-07T19:46:07.8538812Z libcublas-12.8.3.14 | 460.2 MB | ## | 20% 2025-05-07T19:46:07.8539110Z 2025-05-07T19:46:07.8539116Z 2025-05-07T19:46:07.8539121Z 2025-05-07T19:46:07.8650965Z libcusolver-11.7.2.5 | 156.9 MB | #####6 | 56%  2025-05-07T19:46:07.8651285Z 2025-05-07T19:46:07.8768530Z nsight-compute-2025. | 320.6 MB | ##7 | 27%  2025-05-07T19:46:07.8768831Z 2025-05-07T19:46:07.8768835Z 2025-05-07T19:46:07.9155068Z libcusparse-12.5.7.5 | 164.9 MB | ####4 | 45%  2025-05-07T19:46:07.9155390Z 2025-05-07T19:46:07.9155394Z 2025-05-07T19:46:07.9155398Z 2025-05-07T19:46:07.9155402Z 2025-05-07T19:46:07.9491307Z libcufft-11.3.3.41 | 147.4 MB | #####5 | 55%  2025-05-07T19:46:07.9668647Z libcublas-12.8.3.14 | 460.2 MB | ##1 | 21% 2025-05-07T19:46:07.9668976Z 2025-05-07T19:46:07.9668995Z 2025-05-07T19:46:07.9671385Z 2025-05-07T19:46:07.9851572Z libcusolver-11.7.2.5 | 156.9 MB | #####9 | 60%  2025-05-07T19:46:07.9852472Z 2025-05-07T19:46:07.9852486Z 2025-05-07T19:46:08.0172508Z libcusparse-12.5.7.5 | 164.9 MB | ####7 | 48%  2025-05-07T19:46:08.0172825Z 2025-05-07T19:46:08.0187796Z nsight-compute-2025. | 320.6 MB | ##9 | 29%  2025-05-07T19:46:08.0188094Z 2025-05-07T19:46:08.0188098Z 2025-05-07T19:46:08.0188102Z 2025-05-07T19:46:08.0188917Z 2025-05-07T19:46:08.0614191Z libcufft-11.3.3.41 | 147.4 MB | #####8 | 59%  2025-05-07T19:46:08.0864372Z libcublas-12.8.3.14 | 460.2 MB | ##2 | 23% 2025-05-07T19:46:08.0865205Z 2025-05-07T19:46:08.0868317Z 2025-05-07T19:46:08.0873719Z libcusparse-12.5.7.5 | 164.9 MB | ##### | 51%  2025-05-07T19:46:08.0874021Z 2025-05-07T19:46:08.0874027Z 2025-05-07T19:46:08.0874031Z 2025-05-07T19:46:08.1294692Z libcusolver-11.7.2.5 | 156.9 MB | ######3 | 64%  2025-05-07T19:46:08.1295025Z 2025-05-07T19:46:08.1295239Z 2025-05-07T19:46:08.1295245Z 2025-05-07T19:46:08.1295299Z 2025-05-07T19:46:08.1494518Z libcufft-11.3.3.41 | 147.4 MB | ######2 | 63%  2025-05-07T19:46:08.1494877Z 2025-05-07T19:46:08.1721827Z nsight-compute-2025. | 320.6 MB | ###1 | 31%  2025-05-07T19:46:08.1897334Z libcublas-12.8.3.14 | 460.2 MB | ##3 | 24% 2025-05-07T19:46:08.1897890Z 2025-05-07T19:46:08.1898009Z 2025-05-07T19:46:08.1932111Z libcusparse-12.5.7.5 | 164.9 MB | #####3 | 53%  2025-05-07T19:46:08.1932427Z 2025-05-07T19:46:08.1932680Z 2025-05-07T19:46:08.1932699Z 2025-05-07T19:46:08.2303850Z libcusolver-11.7.2.5 | 156.9 MB | ######6 | 67%  2025-05-07T19:46:08.2304749Z 2025-05-07T19:46:08.2304763Z 2025-05-07T19:46:08.2304776Z 2025-05-07T19:46:08.2304818Z 2025-05-07T19:46:08.2789492Z libcufft-11.3.3.41 | 147.4 MB | ######6 | 66%  2025-05-07T19:46:08.2789948Z 2025-05-07T19:46:08.2834299Z nsight-compute-2025. | 320.6 MB | ###3 | 33%  2025-05-07T19:46:08.2957306Z libcublas-12.8.3.14 | 460.2 MB | ##5 | 25% 2025-05-07T19:46:08.2957628Z 2025-05-07T19:46:08.2957899Z 2025-05-07T19:46:08.3007908Z libcusparse-12.5.7.5 | 164.9 MB | #####6 | 56%  2025-05-07T19:46:08.3008716Z 2025-05-07T19:46:08.3008721Z 2025-05-07T19:46:08.3008724Z 2025-05-07T19:46:08.3360208Z libcusolver-11.7.2.5 | 156.9 MB | ####### | 70%  2025-05-07T19:46:08.3360535Z 2025-05-07T19:46:08.3360729Z 2025-05-07T19:46:08.3360741Z 2025-05-07T19:46:08.3360748Z 2025-05-07T19:46:08.3861342Z libcufft-11.3.3.41 | 147.4 MB | ######9 | 70%  2025-05-07T19:46:08.3997176Z libcublas-12.8.3.14 | 460.2 MB | ##6 | 26% 2025-05-07T19:46:08.3997715Z 2025-05-07T19:46:08.3997792Z 2025-05-07T19:46:08.4038696Z libcusparse-12.5.7.5 | 164.9 MB | #####9 | 59%  2025-05-07T19:46:08.4040033Z 2025-05-07T19:46:08.4040048Z 2025-05-07T19:46:08.4040089Z 2025-05-07T19:46:08.4041873Z libcusolver-11.7.2.5 | 156.9 MB | #######3 | 74%  2025-05-07T19:46:08.4043218Z 2025-05-07T19:46:08.4421198Z nsight-compute-2025. | 320.6 MB | ###4 | 35%  2025-05-07T19:46:08.4421495Z 2025-05-07T19:46:08.4421500Z 2025-05-07T19:46:08.4421526Z 2025-05-07T19:46:08.4421530Z 2025-05-07T19:46:08.5001513Z libcufft-11.3.3.41 | 147.4 MB | #######3 | 73%  2025-05-07T19:46:08.5001997Z 2025-05-07T19:46:08.5002116Z 2025-05-07T19:46:08.5018620Z libcusparse-12.5.7.5 | 164.9 MB | ######2 | 62%  2025-05-07T19:46:08.5161488Z libcublas-12.8.3.14 | 460.2 MB | ##7 | 27% 2025-05-07T19:46:08.5162318Z 2025-05-07T19:46:08.5163138Z nsight-compute-2025. | 320.6 MB | ###6 | 36%  2025-05-07T19:46:08.5163929Z 2025-05-07T19:46:08.5163940Z 2025-05-07T19:46:08.5165826Z 2025-05-07T19:46:08.5420888Z libcusolver-11.7.2.5 | 156.9 MB | #######7 | 77%  2025-05-07T19:46:08.5421220Z 2025-05-07T19:46:08.5421225Z 2025-05-07T19:46:08.5421250Z 2025-05-07T19:46:08.5421295Z 2025-05-07T19:46:08.6009396Z libcufft-11.3.3.41 | 147.4 MB | #######6 | 77%  2025-05-07T19:46:08.6010015Z 2025-05-07T19:46:08.6010099Z 2025-05-07T19:46:08.6014677Z libcusparse-12.5.7.5 | 164.9 MB | ######4 | 65%  2025-05-07T19:46:08.6166685Z libcublas-12.8.3.14 | 460.2 MB | ##8 | 28% 2025-05-07T19:46:08.6167070Z 2025-05-07T19:46:08.6167251Z 2025-05-07T19:46:08.6167257Z 2025-05-07T19:46:08.6204281Z libcusolver-11.7.2.5 | 156.9 MB | ######## | 80%  2025-05-07T19:46:08.6204642Z 2025-05-07T19:46:08.6448439Z nsight-compute-2025. | 320.6 MB | ###8 | 38%  2025-05-07T19:46:08.6448753Z 2025-05-07T19:46:08.6448761Z 2025-05-07T19:46:08.6448766Z 2025-05-07T19:46:08.6448770Z 2025-05-07T19:46:08.7011720Z libcufft-11.3.3.41 | 147.4 MB | #######9 | 80%  2025-05-07T19:46:08.7012044Z 2025-05-07T19:46:08.7012051Z 2025-05-07T19:46:08.7022326Z libcusparse-12.5.7.5 | 164.9 MB | ######8 | 68%  2025-05-07T19:46:08.7169642Z libcublas-12.8.3.14 | 460.2 MB | ##9 | 30% 2025-05-07T19:46:08.7169929Z 2025-05-07T19:46:08.7169934Z 2025-05-07T19:46:08.7170273Z 2025-05-07T19:46:08.7207594Z libcusolver-11.7.2.5 | 156.9 MB | ########3 | 84%  2025-05-07T19:46:08.7207985Z 2025-05-07T19:46:08.7448702Z nsight-compute-2025. | 320.6 MB | ###9 | 40%  2025-05-07T19:46:08.7449038Z 2025-05-07T19:46:08.7449045Z 2025-05-07T19:46:08.7449052Z 2025-05-07T19:46:08.7449058Z 2025-05-07T19:46:08.8012833Z libcufft-11.3.3.41 | 147.4 MB | ########3 | 84%  2025-05-07T19:46:08.8013155Z 2025-05-07T19:46:08.8013359Z 2025-05-07T19:46:08.8039967Z libcusparse-12.5.7.5 | 164.9 MB | #######1 | 71%  2025-05-07T19:46:08.8171592Z libcublas-12.8.3.14 | 460.2 MB | ### | 31% 2025-05-07T19:46:08.8171911Z 2025-05-07T19:46:08.8171917Z 2025-05-07T19:46:08.8171920Z 2025-05-07T19:46:08.8206562Z libcusolver-11.7.2.5 | 156.9 MB | ########6 | 87%  2025-05-07T19:46:08.8206967Z 2025-05-07T19:46:08.8452251Z nsight-compute-2025. | 320.6 MB | ####1 | 41%  2025-05-07T19:46:08.8453105Z 2025-05-07T19:46:08.8453119Z 2025-05-07T19:46:08.8453130Z 2025-05-07T19:46:08.8453140Z 2025-05-07T19:46:08.9020298Z libcufft-11.3.3.41 | 147.4 MB | ########7 | 87%  2025-05-07T19:46:08.9020639Z 2025-05-07T19:46:08.9020644Z 2025-05-07T19:46:08.9040323Z libcusparse-12.5.7.5 | 164.9 MB | #######4 | 74%  2025-05-07T19:46:08.9199369Z libcublas-12.8.3.14 | 460.2 MB | ###1 | 32% 2025-05-07T19:46:08.9199922Z 2025-05-07T19:46:08.9199982Z 2025-05-07T19:46:08.9199991Z 2025-05-07T19:46:08.9208714Z libcusolver-11.7.2.5 | 156.9 MB | ######### | 90%  2025-05-07T19:46:08.9209390Z 2025-05-07T19:46:08.9453406Z nsight-compute-2025. | 320.6 MB | ####2 | 43%  2025-05-07T19:46:08.9453795Z 2025-05-07T19:46:08.9454021Z 2025-05-07T19:46:08.9454141Z 2025-05-07T19:46:08.9454403Z 2025-05-07T19:46:09.0066828Z libcufft-11.3.3.41 | 147.4 MB | ######### | 91%  2025-05-07T19:46:09.0067138Z 2025-05-07T19:46:09.0067150Z 2025-05-07T19:46:09.0101031Z libcusparse-12.5.7.5 | 164.9 MB | #######7 | 77%  2025-05-07T19:46:09.0211417Z libcublas-12.8.3.14 | 460.2 MB | ###2 | 33% 2025-05-07T19:46:09.0211814Z 2025-05-07T19:46:09.0211894Z 2025-05-07T19:46:09.0211899Z 2025-05-07T19:46:09.0254327Z libcusolver-11.7.2.5 | 156.9 MB | #########3 | 93%  2025-05-07T19:46:09.0255208Z 2025-05-07T19:46:09.0491246Z nsight-compute-2025. | 320.6 MB | ####4 | 44%  2025-05-07T19:46:09.0491542Z 2025-05-07T19:46:09.0491573Z 2025-05-07T19:46:09.0491577Z 2025-05-07T19:46:09.0491581Z 2025-05-07T19:46:09.1066971Z libcufft-11.3.3.41 | 147.4 MB | #########4 | 94%  2025-05-07T19:46:09.1067299Z 2025-05-07T19:46:09.1067452Z 2025-05-07T19:46:09.1103226Z libcusparse-12.5.7.5 | 164.9 MB | ######## | 80%  2025-05-07T19:46:09.1213390Z libcublas-12.8.3.14 | 460.2 MB | ###4 | 34% 2025-05-07T19:46:09.1213756Z 2025-05-07T19:46:09.1213857Z 2025-05-07T19:46:09.1213873Z 2025-05-07T19:46:09.1256388Z libcusolver-11.7.2.5 | 156.9 MB | #########6 | 97%  2025-05-07T19:46:09.1256778Z 2025-05-07T19:46:09.1492792Z nsight-compute-2025. | 320.6 MB | ####5 | 46%  2025-05-07T19:46:09.1493103Z 2025-05-07T19:46:09.1493200Z 2025-05-07T19:46:09.1493210Z 2025-05-07T19:46:09.1493215Z 2025-05-07T19:46:09.2069568Z libcufft-11.3.3.41 | 147.4 MB | #########7 | 98%  2025-05-07T19:46:09.2069913Z 2025-05-07T19:46:09.2069928Z 2025-05-07T19:46:09.2103578Z libcusparse-12.5.7.5 | 164.9 MB | ########3 | 83%  2025-05-07T19:46:09.2218401Z libcublas-12.8.3.14 | 460.2 MB | ###5 | 35% 2025-05-07T19:46:09.2218699Z 2025-05-07T19:46:09.2218705Z 2025-05-07T19:46:09.2218710Z 2025-05-07T19:46:09.2291445Z libcusolver-11.7.2.5 | 156.9 MB | #########9 | 100%  2025-05-07T19:46:09.2291772Z 2025-05-07T19:46:09.3069929Z nsight-compute-2025. | 320.6 MB | ####7 | 48%  2025-05-07T19:46:09.3070297Z 2025-05-07T19:46:09.3070309Z 2025-05-07T19:46:09.3104597Z libcusparse-12.5.7.5 | 164.9 MB | ########7 | 87%  2025-05-07T19:46:09.3294361Z libcublas-12.8.3.14 | 460.2 MB | ###7 | 37% 2025-05-07T19:46:09.3294805Z 2025-05-07T19:46:09.4069501Z nsight-compute-2025. | 320.6 MB | ##### | 50%  2025-05-07T19:46:09.4069845Z 2025-05-07T19:46:09.4069851Z 2025-05-07T19:46:09.4105512Z libcusparse-12.5.7.5 | 164.9 MB | #########2 | 92%  2025-05-07T19:46:09.4296172Z libcublas-12.8.3.14 | 460.2 MB | ###8 | 39% 2025-05-07T19:46:09.4296713Z 2025-05-07T19:46:09.5069840Z nsight-compute-2025. | 320.6 MB | #####2 | 53%  2025-05-07T19:46:09.5070155Z 2025-05-07T19:46:09.5070161Z 2025-05-07T19:46:09.5105635Z libcusparse-12.5.7.5 | 164.9 MB | #########7 | 97%  2025-05-07T19:46:09.5296767Z libcublas-12.8.3.14 | 460.2 MB | #### | 41% 2025-05-07T19:46:09.5297218Z 2025-05-07T19:46:09.6107458Z nsight-compute-2025. | 320.6 MB | #####5 | 55%  2025-05-07T19:46:09.6298143Z libcublas-12.8.3.14 | 460.2 MB | ####2 | 43% 2025-05-07T19:46:09.6298453Z 2025-05-07T19:46:09.7107986Z nsight-compute-2025. | 320.6 MB | #####8 | 58%  2025-05-07T19:46:09.7298759Z libcublas-12.8.3.14 | 460.2 MB | ####5 | 45% 2025-05-07T19:46:09.7299044Z 2025-05-07T19:46:09.8108191Z nsight-compute-2025. | 320.6 MB | ######1 | 61%  2025-05-07T19:46:09.8299597Z libcublas-12.8.3.14 | 460.2 MB | ####7 | 48% 2025-05-07T19:46:09.8299877Z 2025-05-07T19:46:09.9109361Z nsight-compute-2025. | 320.6 MB | ######4 | 65%  2025-05-07T19:46:09.9299649Z libcublas-12.8.3.14 | 460.2 MB | ####9 | 50% 2025-05-07T19:46:09.9299936Z 2025-05-07T19:46:10.0112105Z nsight-compute-2025. | 320.6 MB | ######7 | 68%  2025-05-07T19:46:10.0300491Z libcublas-12.8.3.14 | 460.2 MB | #####2 | 52% 2025-05-07T19:46:10.0300785Z 2025-05-07T19:46:10.1114481Z nsight-compute-2025. | 320.6 MB | #######1 | 71%  2025-05-07T19:46:10.1301067Z libcublas-12.8.3.14 | 460.2 MB | #####4 | 55% 2025-05-07T19:46:10.1301348Z 2025-05-07T19:46:10.2114457Z nsight-compute-2025. | 320.6 MB | #######4 | 74%  2025-05-07T19:46:10.2301285Z libcublas-12.8.3.14 | 460.2 MB | #####7 | 57% 2025-05-07T19:46:10.2302303Z 2025-05-07T19:46:10.3115401Z nsight-compute-2025. | 320.6 MB | #######7 | 78%  2025-05-07T19:46:10.3301481Z libcublas-12.8.3.14 | 460.2 MB | #####9 | 60% 2025-05-07T19:46:10.3301779Z 2025-05-07T19:46:10.4115725Z nsight-compute-2025. | 320.6 MB | ########1 | 81%  2025-05-07T19:46:10.4303036Z libcublas-12.8.3.14 | 460.2 MB | ######1 | 62% 2025-05-07T19:46:10.4303316Z 2025-05-07T19:46:10.5115704Z nsight-compute-2025. | 320.6 MB | ########4 | 84%  2025-05-07T19:46:10.5303198Z libcublas-12.8.3.14 | 460.2 MB | ######4 | 64% 2025-05-07T19:46:10.5303487Z 2025-05-07T19:46:10.6130728Z nsight-compute-2025. | 320.6 MB | ########7 | 88%  2025-05-07T19:46:10.6339687Z libcublas-12.8.3.14 | 460.2 MB | ######6 | 67% 2025-05-07T19:46:10.6339972Z 2025-05-07T19:46:10.7130924Z nsight-compute-2025. | 320.6 MB | ######### | 91%  2025-05-07T19:46:10.7341231Z libcublas-12.8.3.14 | 460.2 MB | ######9 | 69% 2025-05-07T19:46:10.7341529Z 2025-05-07T19:46:10.8167694Z nsight-compute-2025. | 320.6 MB | #########4 | 94%  2025-05-07T19:46:10.8373804Z libcublas-12.8.3.14 | 460.2 MB | #######1 | 72% 2025-05-07T19:46:10.8374294Z 2025-05-07T19:46:10.8457489Z nsight-compute-2025. | 320.6 MB | #########7 | 97%  2025-05-07T19:46:10.8458517Z 2025-05-07T19:46:10.8458524Z 2025-05-07T19:46:10.8458529Z 2025-05-07T19:46:10.8458532Z 2025-05-07T19:46:10.8784133Z libcufft-11.3.3.41 | 147.4 MB | ########## | 100%  2025-05-07T19:46:10.8784471Z 2025-05-07T19:46:10.8784478Z 2025-05-07T19:46:10.8784487Z 2025-05-07T19:46:10.8784496Z 2025-05-07T19:46:10.8784510Z 2025-05-07T19:46:10.9468160Z libnpp-12.3.3.65 | 130.6 MB | | 0%  2025-05-07T19:46:10.9728460Z libcublas-12.8.3.14 | 460.2 MB | #######3 | 74% 2025-05-07T19:46:10.9729072Z 2025-05-07T19:46:10.9729093Z 2025-05-07T19:46:10.9729100Z 2025-05-07T19:46:10.9784653Z libcusolver-11.7.2.5 | 156.9 MB | ########## | 100%  2025-05-07T19:46:10.9785009Z 2025-05-07T19:46:10.9785016Z 2025-05-07T19:46:10.9785023Z 2025-05-07T19:46:10.9785028Z 2025-05-07T19:46:10.9785034Z 2025-05-07T19:46:11.0136132Z libnpp-12.3.3.65 | 130.6 MB | 5 | 5%  2025-05-07T19:46:11.0136494Z 2025-05-07T19:46:11.0136501Z 2025-05-07T19:46:11.0136507Z 2025-05-07T19:46:11.0136513Z 2025-05-07T19:46:11.0136518Z 2025-05-07T19:46:11.0136523Z 2025-05-07T19:46:11.0779388Z cuda-nsight-12.8.55 | 113.2 MB | | 0%  2025-05-07T19:46:11.0784857Z libcublas-12.8.3.14 | 460.2 MB | #######6 | 76% 2025-05-07T19:46:11.0785124Z 2025-05-07T19:46:11.0785134Z 2025-05-07T19:46:11.0785172Z 2025-05-07T19:46:11.0785199Z 2025-05-07T19:46:11.0785786Z 2025-05-07T19:46:11.1137419Z libnpp-12.3.3.65 | 130.6 MB | #1 | 11%  2025-05-07T19:46:11.1137750Z 2025-05-07T19:46:11.1137754Z 2025-05-07T19:46:11.1137758Z 2025-05-07T19:46:11.1137761Z 2025-05-07T19:46:11.1137764Z 2025-05-07T19:46:11.1137768Z 2025-05-07T19:46:11.1789633Z cuda-nsight-12.8.55 | 113.2 MB | 5 | 5%  2025-05-07T19:46:11.1789976Z 2025-05-07T19:46:11.1789980Z 2025-05-07T19:46:11.1789984Z 2025-05-07T19:46:11.1789987Z 2025-05-07T19:46:11.1789991Z 2025-05-07T19:46:11.2039008Z libnpp-12.3.3.65 | 130.6 MB | #6 | 17%  2025-05-07T19:46:11.2138285Z libcublas-12.8.3.14 | 460.2 MB | #######8 | 78% 2025-05-07T19:46:11.2138599Z 2025-05-07T19:46:11.2138808Z 2025-05-07T19:46:11.2138823Z 2025-05-07T19:46:11.2138830Z 2025-05-07T19:46:11.2138899Z 2025-05-07T19:46:11.2138906Z 2025-05-07T19:46:11.2797684Z cuda-nsight-12.8.55 | 113.2 MB | # | 10%  2025-05-07T19:46:11.2798330Z 2025-05-07T19:46:11.2798335Z 2025-05-07T19:46:11.2798339Z 2025-05-07T19:46:11.2798345Z 2025-05-07T19:46:11.2798348Z 2025-05-07T19:46:11.3222611Z libnpp-12.3.3.65 | 130.6 MB | ##2 | 22%  2025-05-07T19:46:11.3489941Z libcublas-12.8.3.14 | 460.2 MB | ######## | 80% 2025-05-07T19:46:11.3490281Z 2025-05-07T19:46:11.3490483Z 2025-05-07T19:46:11.3490495Z 2025-05-07T19:46:11.3490502Z 2025-05-07T19:46:11.3490512Z 2025-05-07T19:46:11.3490518Z 2025-05-07T19:46:11.3800013Z cuda-nsight-12.8.55 | 113.2 MB | #5 | 16%  2025-05-07T19:46:11.3800365Z 2025-05-07T19:46:11.3800385Z 2025-05-07T19:46:11.3800388Z 2025-05-07T19:46:11.3800392Z 2025-05-07T19:46:11.3800396Z 2025-05-07T19:46:11.4365385Z libnpp-12.3.3.65 | 130.6 MB | ##7 | 28%  2025-05-07T19:46:11.4491298Z libcublas-12.8.3.14 | 460.2 MB | ########1 | 82% 2025-05-07T19:46:11.4491699Z 2025-05-07T19:46:11.4491786Z 2025-05-07T19:46:11.4491878Z 2025-05-07T19:46:11.4491882Z 2025-05-07T19:46:11.4491885Z 2025-05-07T19:46:11.4491902Z 2025-05-07T19:46:11.4874846Z cuda-nsight-12.8.55 | 113.2 MB | ##1 | 22%  2025-05-07T19:46:11.4875175Z 2025-05-07T19:46:11.4875322Z 2025-05-07T19:46:11.4875326Z 2025-05-07T19:46:11.4875329Z 2025-05-07T19:46:11.4875332Z 2025-05-07T19:46:11.5492768Z libnpp-12.3.3.65 | 130.6 MB | ###3 | 33%  2025-05-07T19:46:11.5493161Z 2025-05-07T19:46:11.5493169Z 2025-05-07T19:46:11.5493175Z 2025-05-07T19:46:11.5493181Z 2025-05-07T19:46:11.5493188Z 2025-05-07T19:46:11.5493194Z 2025-05-07T19:46:11.5557810Z cuda-nsight-12.8.55 | 113.2 MB | ##7 | 27%  2025-05-07T19:46:11.5874297Z libcublas-12.8.3.14 | 460.2 MB | ########3 | 84% 2025-05-07T19:46:11.5874596Z 2025-05-07T19:46:11.5874783Z 2025-05-07T19:46:11.5874787Z 2025-05-07T19:46:11.5874790Z 2025-05-07T19:46:11.5874794Z 2025-05-07T19:46:11.6094386Z libnpp-12.3.3.65 | 130.6 MB | ###8 | 38%  2025-05-07T19:46:11.6094719Z 2025-05-07T19:46:11.6094724Z 2025-05-07T19:46:11.6487818Z libcusparse-12.5.7.5 | 164.9 MB | ########## | 100%  2025-05-07T19:46:11.6488350Z 2025-05-07T19:46:11.6488354Z 2025-05-07T19:46:11.6488358Z 2025-05-07T19:46:11.6488362Z 2025-05-07T19:46:11.6488365Z 2025-05-07T19:46:11.6488368Z 2025-05-07T19:46:11.6488371Z 2025-05-07T19:46:11.6490981Z cuda-nvvp-12.8.57 | 112.4 MB | | 0%  2025-05-07T19:46:11.6491291Z 2025-05-07T19:46:11.6491296Z 2025-05-07T19:46:11.6491301Z 2025-05-07T19:46:11.6491304Z 2025-05-07T19:46:11.6491307Z 2025-05-07T19:46:11.6492405Z 2025-05-07T19:46:11.6795638Z cuda-nsight-12.8.55 | 113.2 MB | ###2 | 33%  2025-05-07T19:46:11.7334913Z libcublas-12.8.3.14 | 460.2 MB | ########5 | 85% 2025-05-07T19:46:11.7335267Z 2025-05-07T19:46:11.7335349Z 2025-05-07T19:46:11.7335356Z 2025-05-07T19:46:11.7335390Z 2025-05-07T19:46:11.7335582Z 2025-05-07T19:46:11.7486755Z libnpp-12.3.3.65 | 130.6 MB | ####3 | 44%  2025-05-07T19:46:11.7487175Z 2025-05-07T19:46:11.7487180Z 2025-05-07T19:46:11.7487211Z 2025-05-07T19:46:11.7487216Z 2025-05-07T19:46:11.7487221Z 2025-05-07T19:46:11.7487226Z 2025-05-07T19:46:11.7487231Z 2025-05-07T19:46:11.7944430Z cuda-nvvp-12.8.57 | 112.4 MB | 3 | 4%  2025-05-07T19:46:11.7944783Z 2025-05-07T19:46:11.7944788Z 2025-05-07T19:46:11.7944796Z 2025-05-07T19:46:11.7944803Z 2025-05-07T19:46:11.7944809Z 2025-05-07T19:46:11.7944815Z 2025-05-07T19:46:11.8285228Z cuda-nsight-12.8.55 | 113.2 MB | ###8 | 38%  2025-05-07T19:46:11.8487702Z libcublas-12.8.3.14 | 460.2 MB | ########6 | 87% 2025-05-07T19:46:11.8488029Z 2025-05-07T19:46:11.8488077Z 2025-05-07T19:46:11.8488223Z 2025-05-07T19:46:11.8488234Z 2025-05-07T19:46:11.8488241Z 2025-05-07T19:46:11.8488251Z 2025-05-07T19:46:11.8489954Z 2025-05-07T19:46:11.8660373Z cuda-nvvp-12.8.57 | 112.4 MB | 8 | 8%  2025-05-07T19:46:11.8660707Z 2025-05-07T19:46:11.8660717Z 2025-05-07T19:46:11.8660722Z 2025-05-07T19:46:11.8660726Z 2025-05-07T19:46:11.8661555Z 2025-05-07T19:46:11.9063003Z libnpp-12.3.3.65 | 130.6 MB | ####8 | 48%  2025-05-07T19:46:11.9063322Z 2025-05-07T19:46:11.9063329Z 2025-05-07T19:46:11.9063333Z 2025-05-07T19:46:11.9063336Z 2025-05-07T19:46:11.9063339Z 2025-05-07T19:46:11.9063343Z 2025-05-07T19:46:11.9486613Z cuda-nsight-12.8.55 | 113.2 MB | ####2 | 43%  2025-05-07T19:46:11.9486971Z 2025-05-07T19:46:11.9487046Z 2025-05-07T19:46:11.9487051Z 2025-05-07T19:46:11.9487233Z 2025-05-07T19:46:11.9487248Z 2025-05-07T19:46:11.9487255Z 2025-05-07T19:46:11.9487317Z 2025-05-07T19:46:11.9525688Z cuda-nvvp-12.8.57 | 112.4 MB | #2 | 12%  2025-05-07T19:46:11.9916209Z libcublas-12.8.3.14 | 460.2 MB | ########8 | 88% 2025-05-07T19:46:11.9916812Z 2025-05-07T19:46:11.9916892Z 2025-05-07T19:46:11.9916900Z 2025-05-07T19:46:11.9916918Z 2025-05-07T19:46:11.9916923Z 2025-05-07T19:46:12.0103724Z libnpp-12.3.3.65 | 130.6 MB | #####2 | 53%  2025-05-07T19:46:12.0104045Z 2025-05-07T19:46:12.0104051Z 2025-05-07T19:46:12.0104055Z 2025-05-07T19:46:12.0104068Z 2025-05-07T19:46:12.0104072Z 2025-05-07T19:46:12.0104077Z 2025-05-07T19:46:12.0492520Z cuda-nsight-12.8.55 | 113.2 MB | ####7 | 47%  2025-05-07T19:46:12.0493324Z 2025-05-07T19:46:12.0493349Z 2025-05-07T19:46:12.0493365Z 2025-05-07T19:46:12.0493379Z 2025-05-07T19:46:12.0493393Z 2025-05-07T19:46:12.0493407Z 2025-05-07T19:46:12.0493420Z 2025-05-07T19:46:12.0773784Z cuda-nvvp-12.8.57 | 112.4 MB | #6 | 17%  2025-05-07T19:46:12.1047384Z libcublas-12.8.3.14 | 460.2 MB | ########9 | 90% 2025-05-07T19:46:12.1047688Z 2025-05-07T19:46:12.1047855Z 2025-05-07T19:46:12.1047894Z 2025-05-07T19:46:12.1047902Z 2025-05-07T19:46:12.1048136Z 2025-05-07T19:46:12.1273421Z libnpp-12.3.3.65 | 130.6 MB | #####6 | 57%  2025-05-07T19:46:12.1274327Z 2025-05-07T19:46:12.1274339Z 2025-05-07T19:46:12.1274368Z 2025-05-07T19:46:12.1274379Z 2025-05-07T19:46:12.1274390Z 2025-05-07T19:46:12.1274400Z 2025-05-07T19:46:12.1492008Z cuda-nsight-12.8.55 | 113.2 MB | #####1 | 52%  2025-05-07T19:46:12.1492345Z 2025-05-07T19:46:12.1492442Z 2025-05-07T19:46:12.1492451Z 2025-05-07T19:46:12.1492457Z 2025-05-07T19:46:12.1492462Z 2025-05-07T19:46:12.1492467Z 2025-05-07T19:46:12.1492471Z 2025-05-07T19:46:12.1850642Z cuda-nvvp-12.8.57 | 112.4 MB | ##1 | 21%  2025-05-07T19:46:12.2141374Z libcublas-12.8.3.14 | 460.2 MB | ######### | 91% 2025-05-07T19:46:12.2141716Z 2025-05-07T19:46:12.2141721Z 2025-05-07T19:46:12.2141725Z 2025-05-07T19:46:12.2141728Z 2025-05-07T19:46:12.2141733Z 2025-05-07T19:46:12.2497060Z libnpp-12.3.3.65 | 130.6 MB | ###### | 61%  2025-05-07T19:46:12.2497560Z 2025-05-07T19:46:12.2497565Z 2025-05-07T19:46:12.2497569Z 2025-05-07T19:46:12.2497572Z 2025-05-07T19:46:12.2497576Z 2025-05-07T19:46:12.2497579Z 2025-05-07T19:46:12.2497582Z 2025-05-07T19:46:12.2504140Z cuda-nvvp-12.8.57 | 112.4 MB | ##5 | 25%  2025-05-07T19:46:12.2504486Z 2025-05-07T19:46:12.2504490Z 2025-05-07T19:46:12.2504493Z 2025-05-07T19:46:12.2504497Z 2025-05-07T19:46:12.2504501Z 2025-05-07T19:46:12.2507829Z 2025-05-07T19:46:12.2924712Z cuda-nsight-12.8.55 | 113.2 MB | #####6 | 56%  2025-05-07T19:46:12.3280181Z libcublas-12.8.3.14 | 460.2 MB | #########2 | 92% 2025-05-07T19:46:12.3280709Z 2025-05-07T19:46:12.3280739Z 2025-05-07T19:46:12.3280746Z 2025-05-07T19:46:12.3280752Z 2025-05-07T19:46:12.3280758Z 2025-05-07T19:46:12.3507218Z libnpp-12.3.3.65 | 130.6 MB | ######4 | 65%  2025-05-07T19:46:12.3507536Z 2025-05-07T19:46:12.3507796Z 2025-05-07T19:46:12.3507820Z 2025-05-07T19:46:12.3507824Z 2025-05-07T19:46:12.3507827Z 2025-05-07T19:46:12.3507831Z 2025-05-07T19:46:12.3532439Z cuda-nsight-12.8.55 | 113.2 MB | ###### | 61%  2025-05-07T19:46:12.3532777Z 2025-05-07T19:46:12.3532782Z 2025-05-07T19:46:12.3532786Z 2025-05-07T19:46:12.3532789Z 2025-05-07T19:46:12.3532793Z 2025-05-07T19:46:12.3532796Z 2025-05-07T19:46:12.3532800Z 2025-05-07T19:46:12.4002302Z cuda-nvvp-12.8.57 | 112.4 MB | ##9 | 29%  2025-05-07T19:46:12.4320377Z libcublas-12.8.3.14 | 460.2 MB | #########3 | 93% 2025-05-07T19:46:12.4320903Z 2025-05-07T19:46:12.4320935Z 2025-05-07T19:46:12.4320939Z 2025-05-07T19:46:12.4320976Z 2025-05-07T19:46:12.4320983Z 2025-05-07T19:46:12.4535671Z libnpp-12.3.3.65 | 130.6 MB | ######8 | 68%  2025-05-07T19:46:12.4535988Z 2025-05-07T19:46:12.4536049Z 2025-05-07T19:46:12.4536053Z 2025-05-07T19:46:12.4536061Z 2025-05-07T19:46:12.4536086Z 2025-05-07T19:46:12.4536105Z 2025-05-07T19:46:12.4536955Z cuda-nsight-12.8.55 | 113.2 MB | ######4 | 65%  2025-05-07T19:46:12.4537349Z 2025-05-07T19:46:12.4537354Z 2025-05-07T19:46:12.4537364Z 2025-05-07T19:46:12.4537367Z 2025-05-07T19:46:12.4537370Z 2025-05-07T19:46:12.4537374Z 2025-05-07T19:46:12.4539693Z 2025-05-07T19:46:12.5062290Z cuda-nvvp-12.8.57 | 112.4 MB | ###3 | 34%  2025-05-07T19:46:12.5323134Z libcublas-12.8.3.14 | 460.2 MB | #########4 | 94% 2025-05-07T19:46:12.5323463Z 2025-05-07T19:46:12.5323469Z 2025-05-07T19:46:12.5323475Z 2025-05-07T19:46:12.5323482Z 2025-05-07T19:46:12.5323487Z 2025-05-07T19:46:12.5538719Z libnpp-12.3.3.65 | 130.6 MB | #######1 | 72%  2025-05-07T19:46:12.5539073Z 2025-05-07T19:46:12.5539078Z 2025-05-07T19:46:12.5539085Z 2025-05-07T19:46:12.5539090Z 2025-05-07T19:46:12.5539093Z 2025-05-07T19:46:12.5539098Z 2025-05-07T19:46:12.5539104Z 2025-05-07T19:46:12.5539750Z cuda-nvvp-12.8.57 | 112.4 MB | ###7 | 38%  2025-05-07T19:46:12.5540088Z 2025-05-07T19:46:12.5540092Z 2025-05-07T19:46:12.5540095Z 2025-05-07T19:46:12.5540098Z 2025-05-07T19:46:12.5540102Z 2025-05-07T19:46:12.5540105Z 2025-05-07T19:46:12.6122108Z cuda-nsight-12.8.55 | 113.2 MB | ######9 | 69%  2025-05-07T19:46:12.6357787Z libcublas-12.8.3.14 | 460.2 MB | #########5 | 96% 2025-05-07T19:46:12.6358099Z 2025-05-07T19:46:12.6358312Z 2025-05-07T19:46:12.6358325Z 2025-05-07T19:46:12.6358335Z 2025-05-07T19:46:12.6358342Z 2025-05-07T19:46:12.6538927Z libnpp-12.3.3.65 | 130.6 MB | #######5 | 76%  2025-05-07T19:46:12.6539270Z 2025-05-07T19:46:12.6539279Z 2025-05-07T19:46:12.6539283Z 2025-05-07T19:46:12.6539288Z 2025-05-07T19:46:12.6539291Z 2025-05-07T19:46:12.6539294Z 2025-05-07T19:46:12.6539299Z 2025-05-07T19:46:12.6568885Z cuda-nvvp-12.8.57 | 112.4 MB | ####2 | 42%  2025-05-07T19:46:12.6569252Z 2025-05-07T19:46:12.6569257Z 2025-05-07T19:46:12.6569283Z 2025-05-07T19:46:12.6569288Z 2025-05-07T19:46:12.6569292Z 2025-05-07T19:46:12.6569295Z 2025-05-07T19:46:12.7120600Z cuda-nsight-12.8.55 | 113.2 MB | #######3 | 73%  2025-05-07T19:46:12.7357713Z libcublas-12.8.3.14 | 460.2 MB | #########6 | 97% 2025-05-07T19:46:12.7358092Z 2025-05-07T19:46:12.7358248Z 2025-05-07T19:46:12.7358256Z 2025-05-07T19:46:12.7358262Z 2025-05-07T19:46:12.7358267Z 2025-05-07T19:46:12.7538947Z libnpp-12.3.3.65 | 130.6 MB | #######9 | 79%  2025-05-07T19:46:12.7539288Z 2025-05-07T19:46:12.7539295Z 2025-05-07T19:46:12.7539301Z 2025-05-07T19:46:12.7539304Z 2025-05-07T19:46:12.7539308Z 2025-05-07T19:46:12.7539311Z 2025-05-07T19:46:12.7539315Z 2025-05-07T19:46:12.7568891Z cuda-nvvp-12.8.57 | 112.4 MB | ####7 | 47%  2025-05-07T19:46:12.7569236Z 2025-05-07T19:46:12.7569240Z 2025-05-07T19:46:12.7569245Z 2025-05-07T19:46:12.7569494Z 2025-05-07T19:46:12.7569497Z 2025-05-07T19:46:12.7569525Z 2025-05-07T19:46:12.8123545Z cuda-nsight-12.8.55 | 113.2 MB | #######7 | 78%  2025-05-07T19:46:12.8361862Z libcublas-12.8.3.14 | 460.2 MB | #########7 | 98% 2025-05-07T19:46:12.8362728Z 2025-05-07T19:46:12.8362742Z 2025-05-07T19:46:12.8362754Z 2025-05-07T19:46:12.8362764Z 2025-05-07T19:46:12.8362774Z 2025-05-07T19:46:12.8575134Z libnpp-12.3.3.65 | 130.6 MB | ########3 | 83%  2025-05-07T19:46:12.8575470Z 2025-05-07T19:46:12.8575498Z 2025-05-07T19:46:12.8575502Z 2025-05-07T19:46:12.8575505Z 2025-05-07T19:46:12.8575509Z 2025-05-07T19:46:12.8575513Z 2025-05-07T19:46:12.8609076Z cuda-nsight-12.8.55 | 113.2 MB | ########2 | 83%  2025-05-07T19:46:12.8609459Z 2025-05-07T19:46:12.8609465Z 2025-05-07T19:46:12.8609470Z 2025-05-07T19:46:12.8609475Z 2025-05-07T19:46:12.8609479Z 2025-05-07T19:46:12.8609484Z 2025-05-07T19:46:12.8609490Z 2025-05-07T19:46:12.9198931Z cuda-nvvp-12.8.57 | 112.4 MB | #####1 | 51%  2025-05-07T19:46:12.9398517Z libcublas-12.8.3.14 | 460.2 MB | #########9 | 99% 2025-05-07T19:46:12.9398936Z 2025-05-07T19:46:12.9399305Z 2025-05-07T19:46:12.9399524Z 2025-05-07T19:46:12.9399533Z 2025-05-07T19:46:12.9399538Z 2025-05-07T19:46:12.9612045Z libnpp-12.3.3.65 | 130.6 MB | ########7 | 87%  2025-05-07T19:46:12.9612567Z 2025-05-07T19:46:12.9612655Z 2025-05-07T19:46:12.9612662Z 2025-05-07T19:46:12.9612668Z 2025-05-07T19:46:12.9612673Z 2025-05-07T19:46:12.9612678Z 2025-05-07T19:46:12.9612683Z 2025-05-07T19:46:12.9735759Z cuda-nvvp-12.8.57 | 112.4 MB | #####5 | 56%  2025-05-07T19:46:12.9736139Z 2025-05-07T19:46:12.9736359Z 2025-05-07T19:46:12.9736371Z 2025-05-07T19:46:12.9736378Z 2025-05-07T19:46:12.9736384Z 2025-05-07T19:46:12.9736390Z 2025-05-07T19:46:13.0399678Z cuda-nsight-12.8.55 | 113.2 MB | ########7 | 87%  2025-05-07T19:46:13.0400116Z 2025-05-07T19:46:13.0400124Z 2025-05-07T19:46:13.0400468Z 2025-05-07T19:46:13.0400474Z 2025-05-07T19:46:13.0400478Z 2025-05-07T19:46:13.0612631Z libnpp-12.3.3.65 | 130.6 MB | #########1 | 92%  2025-05-07T19:46:13.0612960Z 2025-05-07T19:46:13.0613146Z 2025-05-07T19:46:13.0613151Z 2025-05-07T19:46:13.0613156Z 2025-05-07T19:46:13.0613160Z 2025-05-07T19:46:13.0613198Z 2025-05-07T19:46:13.0613202Z 2025-05-07T19:46:13.1120809Z cuda-nvvp-12.8.57 | 112.4 MB | ######1 | 62%  2025-05-07T19:46:13.1121195Z 2025-05-07T19:46:13.1121201Z 2025-05-07T19:46:13.1121206Z 2025-05-07T19:46:13.1121210Z 2025-05-07T19:46:13.1121214Z 2025-05-07T19:46:13.1121220Z 2025-05-07T19:46:13.1401177Z cuda-nsight-12.8.55 | 113.2 MB | #########1 | 91%  2025-05-07T19:46:13.1401552Z 2025-05-07T19:46:13.1401629Z 2025-05-07T19:46:13.1401636Z 2025-05-07T19:46:13.1401727Z 2025-05-07T19:46:13.1401737Z 2025-05-07T19:46:13.1613483Z libnpp-12.3.3.65 | 130.6 MB | #########6 | 97%  2025-05-07T19:46:13.1613916Z 2025-05-07T19:46:13.1613921Z 2025-05-07T19:46:13.1613925Z 2025-05-07T19:46:13.1613941Z 2025-05-07T19:46:13.1613944Z 2025-05-07T19:46:13.1613948Z 2025-05-07T19:46:13.1613951Z 2025-05-07T19:46:13.2121234Z cuda-nvvp-12.8.57 | 112.4 MB | ######7 | 67%  2025-05-07T19:46:13.2121561Z 2025-05-07T19:46:13.2121746Z 2025-05-07T19:46:13.2121761Z 2025-05-07T19:46:13.2121769Z 2025-05-07T19:46:13.2121776Z 2025-05-07T19:46:13.2121782Z 2025-05-07T19:46:13.2614492Z cuda-nsight-12.8.55 | 113.2 MB | #########5 | 96%  2025-05-07T19:46:13.2614846Z 2025-05-07T19:46:13.2614932Z 2025-05-07T19:46:13.2614937Z 2025-05-07T19:46:13.2615040Z 2025-05-07T19:46:13.2615051Z 2025-05-07T19:46:13.2615057Z 2025-05-07T19:46:13.2615062Z 2025-05-07T19:46:13.3620565Z cuda-nvvp-12.8.57 | 112.4 MB | #######4 | 74%  2025-05-07T19:46:13.3620959Z 2025-05-07T19:46:13.3620989Z 2025-05-07T19:46:13.3621275Z 2025-05-07T19:46:13.3621280Z 2025-05-07T19:46:13.3621306Z 2025-05-07T19:46:13.3621310Z 2025-05-07T19:46:13.3621314Z 2025-05-07T19:46:13.4860432Z cuda-nvvp-12.8.57 | 112.4 MB | #######9 | 80%  2025-05-07T19:46:13.4860856Z 2025-05-07T19:46:13.4860863Z 2025-05-07T19:46:13.4860870Z 2025-05-07T19:46:13.4860876Z 2025-05-07T19:46:13.4860882Z 2025-05-07T19:46:13.4860888Z 2025-05-07T19:46:13.4860893Z 2025-05-07T19:46:13.6229763Z cuda-nvvp-12.8.57 | 112.4 MB | ########5 | 85%  2025-05-07T19:46:13.6230211Z 2025-05-07T19:46:13.6230216Z 2025-05-07T19:46:13.6230220Z 2025-05-07T19:46:13.6230224Z 2025-05-07T19:46:13.6230228Z 2025-05-07T19:46:13.6230232Z 2025-05-07T19:46:13.6230237Z 2025-05-07T19:46:14.1518403Z cuda-nvvp-12.8.57 | 112.4 MB | ######### | 91%  2025-05-07T19:46:14.1518791Z 2025-05-07T19:46:14.1518796Z 2025-05-07T19:46:14.1518800Z 2025-05-07T19:46:14.1518806Z 2025-05-07T19:46:14.4618074Z libcufft-11.3.3.41 | 147.4 MB | ########## | 100%  2025-05-07T19:46:14.4618505Z 2025-05-07T19:46:14.4618510Z 2025-05-07T19:46:14.4618514Z 2025-05-07T19:46:14.4618517Z 2025-05-07T19:46:14.4618521Z 2025-05-07T19:46:14.4618524Z 2025-05-07T19:46:14.4965438Z cuda-nsight-12.8.55 | 113.2 MB | ########## | 100%  2025-05-07T19:46:14.4965847Z 2025-05-07T19:46:14.4965853Z 2025-05-07T19:46:14.4965858Z 2025-05-07T19:46:14.4965864Z 2025-05-07T19:46:14.4965869Z 2025-05-07T19:46:14.4965874Z 2025-05-07T19:46:14.4965879Z 2025-05-07T19:46:14.4965885Z 2025-05-07T19:46:14.5965946Z cuda-nvrtc-12.8.61 | 63.1 MB | | 0%  2025-05-07T19:46:14.5966375Z 2025-05-07T19:46:14.5966381Z 2025-05-07T19:46:14.5966386Z 2025-05-07T19:46:14.5966393Z 2025-05-07T19:46:14.5966399Z 2025-05-07T19:46:14.5966404Z 2025-05-07T19:46:14.5966408Z 2025-05-07T19:46:14.5966413Z 2025-05-07T19:46:14.6065734Z cuda-nvrtc-12.8.61 | 63.1 MB | #6 | 17%  2025-05-07T19:46:14.6066142Z 2025-05-07T19:46:14.6066401Z 2025-05-07T19:46:14.6066406Z 2025-05-07T19:46:14.6066410Z 2025-05-07T19:46:14.6066413Z 2025-05-07T19:46:14.6511514Z libnpp-12.3.3.65 | 130.6 MB | ########## | 100%  2025-05-07T19:46:14.6511920Z 2025-05-07T19:46:14.6511930Z 2025-05-07T19:46:14.6511935Z 2025-05-07T19:46:14.6511941Z 2025-05-07T19:46:14.6511947Z 2025-05-07T19:46:14.6511954Z 2025-05-07T19:46:14.6511960Z 2025-05-07T19:46:14.6511965Z 2025-05-07T19:46:14.6511970Z 2025-05-07T19:46:14.6968450Z libcurand-10.3.9.55 | 43.6 MB | | 0%  2025-05-07T19:46:14.6968851Z 2025-05-07T19:46:14.6968858Z 2025-05-07T19:46:14.6968862Z 2025-05-07T19:46:14.6968866Z 2025-05-07T19:46:14.6968869Z 2025-05-07T19:46:14.6968873Z 2025-05-07T19:46:14.6968877Z 2025-05-07T19:46:14.6968880Z 2025-05-07T19:46:14.7510024Z cuda-nvrtc-12.8.61 | 63.1 MB | ###1 | 32%  2025-05-07T19:46:14.7510390Z 2025-05-07T19:46:14.7510422Z 2025-05-07T19:46:14.7510426Z 2025-05-07T19:46:14.7510454Z 2025-05-07T19:46:14.7510459Z 2025-05-07T19:46:14.7510479Z 2025-05-07T19:46:14.7510483Z 2025-05-07T19:46:14.7510488Z 2025-05-07T19:46:14.7510492Z 2025-05-07T19:46:14.7970188Z libcurand-10.3.9.55 | 43.6 MB | #6 | 17%  2025-05-07T19:46:14.7970573Z 2025-05-07T19:46:14.7970578Z 2025-05-07T19:46:14.7970582Z 2025-05-07T19:46:14.7970586Z 2025-05-07T19:46:14.7970590Z 2025-05-07T19:46:14.7970594Z 2025-05-07T19:46:14.7970598Z 2025-05-07T19:46:14.7970603Z 2025-05-07T19:46:14.8516407Z cuda-nvrtc-12.8.61 | 63.1 MB | ####4 | 44%  2025-05-07T19:46:14.8516800Z 2025-05-07T19:46:14.8516806Z 2025-05-07T19:46:14.8516812Z 2025-05-07T19:46:14.8516817Z 2025-05-07T19:46:14.8516822Z 2025-05-07T19:46:14.8516828Z 2025-05-07T19:46:14.8516833Z 2025-05-07T19:46:14.8516841Z 2025-05-07T19:46:14.8516845Z 2025-05-07T19:46:14.8985460Z libcurand-10.3.9.55 | 43.6 MB | ###2 | 33%  2025-05-07T19:46:14.8986130Z 2025-05-07T19:46:14.8986157Z 2025-05-07T19:46:14.8986161Z 2025-05-07T19:46:14.8986165Z 2025-05-07T19:46:14.8986168Z 2025-05-07T19:46:14.8986172Z 2025-05-07T19:46:14.8986175Z 2025-05-07T19:46:14.8986178Z 2025-05-07T19:46:14.9517137Z cuda-nvrtc-12.8.61 | 63.1 MB | #####6 | 56%  2025-05-07T19:46:14.9517518Z 2025-05-07T19:46:14.9517539Z 2025-05-07T19:46:14.9517546Z 2025-05-07T19:46:14.9517553Z 2025-05-07T19:46:14.9517560Z 2025-05-07T19:46:14.9517567Z 2025-05-07T19:46:14.9517574Z 2025-05-07T19:46:14.9517580Z 2025-05-07T19:46:14.9517587Z 2025-05-07T19:46:14.9987405Z libcurand-10.3.9.55 | 43.6 MB | ####9 | 50%  2025-05-07T19:46:14.9987745Z 2025-05-07T19:46:14.9987751Z 2025-05-07T19:46:14.9987757Z 2025-05-07T19:46:14.9987761Z 2025-05-07T19:46:14.9987770Z 2025-05-07T19:46:14.9987774Z 2025-05-07T19:46:14.9987779Z 2025-05-07T19:46:14.9987782Z 2025-05-07T19:46:15.0119466Z cuda-nvrtc-12.8.61 | 63.1 MB | ######8 | 69%  2025-05-07T19:46:15.0119794Z 2025-05-07T19:46:15.0120029Z 2025-05-07T19:46:15.0120045Z 2025-05-07T19:46:15.0120052Z 2025-05-07T19:46:15.0120115Z 2025-05-07T19:46:15.0120121Z 2025-05-07T19:46:15.0120128Z 2025-05-07T19:46:15.0121036Z cuda-nvvp-12.8.57 | 112.4 MB | ########## | 100%  2025-05-07T19:46:15.0121418Z 2025-05-07T19:46:15.0121422Z 2025-05-07T19:46:15.0121425Z 2025-05-07T19:46:15.0121430Z 2025-05-07T19:46:15.0121433Z 2025-05-07T19:46:15.0121437Z 2025-05-07T19:46:15.0121442Z 2025-05-07T19:46:15.0518684Z cuda-nvvp-12.8.57 | 112.4 MB | ########## | 100%  2025-05-07T19:46:15.0519073Z 2025-05-07T19:46:15.0519080Z 2025-05-07T19:46:15.0519086Z 2025-05-07T19:46:15.0519093Z 2025-05-07T19:46:15.0519101Z 2025-05-07T19:46:15.0519106Z 2025-05-07T19:46:15.0519113Z 2025-05-07T19:46:15.0519141Z 2025-05-07T19:46:15.0519147Z 2025-05-07T19:46:15.0579662Z libcurand-10.3.9.55 | 43.6 MB | ######6 | 67%  2025-05-07T19:46:15.0580055Z 2025-05-07T19:46:15.0580059Z 2025-05-07T19:46:15.0580063Z 2025-05-07T19:46:15.0580066Z 2025-05-07T19:46:15.0580070Z 2025-05-07T19:46:15.0580073Z 2025-05-07T19:46:15.0580102Z 2025-05-07T19:46:15.0580106Z 2025-05-07T19:46:15.0580110Z 2025-05-07T19:46:15.0580113Z 2025-05-07T19:46:15.1063637Z gds-tools-1.13.0.11 | 37.9 MB | | 0%  2025-05-07T19:46:15.1063992Z 2025-05-07T19:46:15.1063999Z 2025-05-07T19:46:15.1064004Z 2025-05-07T19:46:15.1064009Z 2025-05-07T19:46:15.1064046Z 2025-05-07T19:46:15.1064050Z 2025-05-07T19:46:15.1064055Z 2025-05-07T19:46:15.1064060Z 2025-05-07T19:46:15.1584180Z cuda-nvrtc-12.8.61 | 63.1 MB | ######## | 81%  2025-05-07T19:46:15.1585170Z 2025-05-07T19:46:15.1585184Z 2025-05-07T19:46:15.1585195Z 2025-05-07T19:46:15.1585205Z 2025-05-07T19:46:15.1585215Z 2025-05-07T19:46:15.1585226Z 2025-05-07T19:46:15.1585281Z 2025-05-07T19:46:15.1585292Z 2025-05-07T19:46:15.1585328Z 2025-05-07T19:46:15.1586531Z 2025-05-07T19:46:15.1716234Z gds-tools-1.13.0.11 | 37.9 MB | #5 | 15%  2025-05-07T19:46:15.1716588Z 2025-05-07T19:46:15.1716830Z 2025-05-07T19:46:15.1716844Z 2025-05-07T19:46:15.1716852Z 2025-05-07T19:46:15.1716859Z 2025-05-07T19:46:15.1716865Z 2025-05-07T19:46:15.1716874Z 2025-05-07T19:46:15.1716880Z 2025-05-07T19:46:15.1716887Z 2025-05-07T19:46:15.2327919Z libcurand-10.3.9.55 | 43.6 MB | ########2 | 82%  2025-05-07T19:46:15.2328328Z 2025-05-07T19:46:15.2328358Z 2025-05-07T19:46:15.2328363Z 2025-05-07T19:46:15.2328368Z 2025-05-07T19:46:15.2328374Z 2025-05-07T19:46:15.2328381Z 2025-05-07T19:46:15.2328386Z 2025-05-07T19:46:15.2328391Z 2025-05-07T19:46:15.2581860Z cuda-nvrtc-12.8.61 | 63.1 MB | #########2 | 93%  2025-05-07T19:46:15.2582233Z 2025-05-07T19:46:15.2582239Z 2025-05-07T19:46:15.2582477Z 2025-05-07T19:46:15.2582481Z 2025-05-07T19:46:15.2582503Z 2025-05-07T19:46:15.2582506Z 2025-05-07T19:46:15.2582510Z 2025-05-07T19:46:15.2582513Z 2025-05-07T19:46:15.2582517Z 2025-05-07T19:46:15.2582520Z 2025-05-07T19:46:15.2800515Z gds-tools-1.13.0.11 | 37.9 MB | ##9 | 29%  2025-05-07T19:46:15.2800872Z 2025-05-07T19:46:15.2800876Z 2025-05-07T19:46:15.2800880Z 2025-05-07T19:46:15.2800884Z 2025-05-07T19:46:15.2800887Z 2025-05-07T19:46:15.2800891Z 2025-05-07T19:46:15.2800894Z 2025-05-07T19:46:15.2800897Z 2025-05-07T19:46:15.2800901Z 2025-05-07T19:46:15.3582845Z libcurand-10.3.9.55 | 43.6 MB | #########6 | 97%  2025-05-07T19:46:15.3583236Z 2025-05-07T19:46:15.3583268Z 2025-05-07T19:46:15.3583272Z 2025-05-07T19:46:15.3583276Z 2025-05-07T19:46:15.3583280Z 2025-05-07T19:46:15.3583283Z 2025-05-07T19:46:15.3583288Z 2025-05-07T19:46:15.3583293Z 2025-05-07T19:46:15.3583298Z 2025-05-07T19:46:15.3583302Z 2025-05-07T19:46:15.4221162Z gds-tools-1.13.0.11 | 37.9 MB | ####9 | 49%  2025-05-07T19:46:15.4221541Z 2025-05-07T19:46:15.4221546Z 2025-05-07T19:46:15.4221565Z 2025-05-07T19:46:15.4604124Z libcusolver-11.7.2.5 | 156.9 MB | ########## | 100%  2025-05-07T19:46:15.4604547Z 2025-05-07T19:46:15.4604552Z 2025-05-07T19:46:15.4604558Z 2025-05-07T19:46:15.4604563Z 2025-05-07T19:46:15.4604569Z 2025-05-07T19:46:15.4604574Z 2025-05-07T19:46:15.4604580Z 2025-05-07T19:46:15.4604586Z 2025-05-07T19:46:15.4604591Z 2025-05-07T19:46:15.4604596Z 2025-05-07T19:46:15.6075142Z gds-tools-1.13.0.11 | 37.9 MB | ######6 | 67%  2025-05-07T19:46:15.6075592Z 2025-05-07T19:46:15.6075597Z 2025-05-07T19:46:15.6075601Z 2025-05-07T19:46:15.6075604Z 2025-05-07T19:46:15.6075608Z 2025-05-07T19:46:15.6075635Z 2025-05-07T19:46:15.6075640Z 2025-05-07T19:46:15.6075643Z 2025-05-07T19:46:15.6075646Z 2025-05-07T19:46:15.6075650Z 2025-05-07T19:46:15.6576353Z gds-tools-1.13.0.11 | 37.9 MB | ########2 | 82%  2025-05-07T19:46:15.6576761Z 2025-05-07T19:46:15.6576766Z 2025-05-07T19:46:15.6576770Z 2025-05-07T19:46:15.6576799Z 2025-05-07T19:46:15.6576803Z 2025-05-07T19:46:15.6576806Z 2025-05-07T19:46:15.7378584Z cuda-nsight-12.8.55 | 113.2 MB | ########## | 100%  2025-05-07T19:46:15.7378978Z 2025-05-07T19:46:15.7749336Z nsight-compute-2025. | 320.6 MB | ########## | 100%  2025-05-07T19:46:15.7749726Z 2025-05-07T19:46:15.7749732Z 2025-05-07T19:46:15.7749736Z 2025-05-07T19:46:15.7749740Z 2025-05-07T19:46:15.7749743Z 2025-05-07T19:46:15.7749747Z 2025-05-07T19:46:15.7749751Z 2025-05-07T19:46:15.7749754Z 2025-05-07T19:46:15.7749758Z 2025-05-07T19:46:15.7749762Z 2025-05-07T19:46:15.7749765Z 2025-05-07T19:46:15.8004729Z libnvjitlink-12.8.61 | 28.7 MB | | 0%  2025-05-07T19:46:15.8005137Z 2025-05-07T19:46:15.8005143Z 2025-05-07T19:46:15.8005180Z 2025-05-07T19:46:15.8005184Z 2025-05-07T19:46:15.8005203Z 2025-05-07T19:46:15.8005206Z 2025-05-07T19:46:15.8005210Z 2025-05-07T19:46:15.8005213Z 2025-05-07T19:46:15.8005217Z 2025-05-07T19:46:15.8419657Z libcurand-10.3.9.55 | 43.6 MB | ########## | 100%  2025-05-07T19:46:15.8420298Z 2025-05-07T19:46:15.8420302Z 2025-05-07T19:46:15.8420324Z 2025-05-07T19:46:15.8420328Z 2025-05-07T19:46:15.8420331Z 2025-05-07T19:46:15.8420335Z 2025-05-07T19:46:15.8420339Z 2025-05-07T19:46:15.8420342Z 2025-05-07T19:46:15.8420345Z 2025-05-07T19:46:15.8420349Z 2025-05-07T19:46:15.8420352Z 2025-05-07T19:46:15.8420356Z 2025-05-07T19:46:15.8751962Z cuda-nvcc-tools-12.8 | 24.5 MB | | 0%  2025-05-07T19:46:15.8752348Z 2025-05-07T19:46:15.8752353Z 2025-05-07T19:46:15.8752358Z 2025-05-07T19:46:15.8752367Z 2025-05-07T19:46:15.8752372Z 2025-05-07T19:46:15.8752378Z 2025-05-07T19:46:15.8752383Z 2025-05-07T19:46:15.8752387Z 2025-05-07T19:46:15.8752654Z 2025-05-07T19:46:15.8752690Z 2025-05-07T19:46:15.8752709Z 2025-05-07T19:46:15.9419117Z libnvjitlink-12.8.61 | 28.7 MB | ##9 | 29%  2025-05-07T19:46:15.9419491Z 2025-05-07T19:46:15.9419518Z 2025-05-07T19:46:15.9419524Z 2025-05-07T19:46:15.9419528Z 2025-05-07T19:46:15.9419532Z 2025-05-07T19:46:15.9419560Z 2025-05-07T19:46:15.9419564Z 2025-05-07T19:46:15.9419567Z 2025-05-07T19:46:15.9419571Z 2025-05-07T19:46:15.9419575Z 2025-05-07T19:46:15.9419579Z 2025-05-07T19:46:15.9419582Z 2025-05-07T19:46:15.9701154Z cuda-nvcc-tools-12.8 | 24.5 MB | ##6 | 26%  2025-05-07T19:46:15.9701531Z 2025-05-07T19:46:15.9701831Z 2025-05-07T19:46:15.9754819Z libcusparse-12.5.7.5 | 164.9 MB | ########## | 100%  2025-05-07T19:46:15.9755144Z 2025-05-07T19:46:15.9755149Z 2025-05-07T19:46:15.9755156Z 2025-05-07T19:46:15.9755162Z 2025-05-07T19:46:15.9755167Z 2025-05-07T19:46:15.9755172Z 2025-05-07T19:46:15.9755238Z 2025-05-07T19:46:15.9755242Z 2025-05-07T19:46:15.9755259Z 2025-05-07T19:46:15.9755263Z 2025-05-07T19:46:15.9755266Z 2025-05-07T19:46:16.0420014Z libnvjitlink-12.8.61 | 28.7 MB | #####6 | 57%  2025-05-07T19:46:16.0420410Z 2025-05-07T19:46:16.0420415Z 2025-05-07T19:46:16.0420442Z 2025-05-07T19:46:16.0420446Z 2025-05-07T19:46:16.0420449Z 2025-05-07T19:46:16.0420453Z 2025-05-07T19:46:16.0420456Z 2025-05-07T19:46:16.0420460Z 2025-05-07T19:46:16.0420463Z 2025-05-07T19:46:16.0420466Z 2025-05-07T19:46:16.0420470Z 2025-05-07T19:46:16.0420474Z 2025-05-07T19:46:16.0756546Z cuda-nvcc-tools-12.8 | 24.5 MB | ####3 | 44%  2025-05-07T19:46:16.0756960Z 2025-05-07T19:46:16.0756967Z 2025-05-07T19:46:16.0756971Z 2025-05-07T19:46:16.0756976Z 2025-05-07T19:46:16.0756980Z 2025-05-07T19:46:16.0756984Z 2025-05-07T19:46:16.0756990Z 2025-05-07T19:46:16.0756994Z 2025-05-07T19:46:16.0756998Z 2025-05-07T19:46:16.0757002Z 2025-05-07T19:46:16.0757044Z 2025-05-07T19:46:16.0966790Z libnvjitlink-12.8.61 | 28.7 MB | ########7 | 87%  2025-05-07T19:46:16.0967203Z 2025-05-07T19:46:16.0967207Z 2025-05-07T19:46:16.0967211Z 2025-05-07T19:46:16.0967215Z 2025-05-07T19:46:16.0967218Z 2025-05-07T19:46:16.0967222Z 2025-05-07T19:46:16.0967225Z 2025-05-07T19:46:16.0967228Z 2025-05-07T19:46:16.1348464Z cuda-nvrtc-12.8.61 | 63.1 MB | ########## | 100%  2025-05-07T19:46:16.1348855Z 2025-05-07T19:46:16.1348860Z 2025-05-07T19:46:16.1348864Z 2025-05-07T19:46:16.1348868Z 2025-05-07T19:46:16.1348872Z 2025-05-07T19:46:16.1348875Z 2025-05-07T19:46:16.1348878Z 2025-05-07T19:46:16.1348883Z 2025-05-07T19:46:16.1348887Z 2025-05-07T19:46:16.1348890Z 2025-05-07T19:46:16.1348894Z 2025-05-07T19:46:16.1348897Z 2025-05-07T19:46:16.1348900Z 2025-05-07T19:46:16.1419883Z cuda-nvvm-tools-12.8 | 23.5 MB | | 0%  2025-05-07T19:46:16.1420298Z 2025-05-07T19:46:16.1420303Z 2025-05-07T19:46:16.1420320Z 2025-05-07T19:46:16.1420324Z 2025-05-07T19:46:16.1420327Z 2025-05-07T19:46:16.1420331Z 2025-05-07T19:46:16.1420334Z 2025-05-07T19:46:16.1420338Z 2025-05-07T19:46:16.1420341Z 2025-05-07T19:46:16.1420345Z 2025-05-07T19:46:16.1420348Z 2025-05-07T19:46:16.1420352Z 2025-05-07T19:46:16.1904607Z cuda-nvcc-tools-12.8 | 24.5 MB | ###### | 61%  2025-05-07T19:46:16.1904989Z 2025-05-07T19:46:16.1905019Z 2025-05-07T19:46:16.1905026Z 2025-05-07T19:46:16.1905030Z 2025-05-07T19:46:16.1905034Z 2025-05-07T19:46:16.1905039Z 2025-05-07T19:46:16.1905045Z 2025-05-07T19:46:16.1905073Z 2025-05-07T19:46:16.1905077Z 2025-05-07T19:46:16.1905080Z 2025-05-07T19:46:16.1905364Z gds-tools-1.13.0.11 | 37.9 MB | ########## | 100%  2025-05-07T19:46:16.1905677Z 2025-05-07T19:46:16.1905682Z 2025-05-07T19:46:16.1905686Z 2025-05-07T19:46:16.1905690Z 2025-05-07T19:46:16.1905694Z 2025-05-07T19:46:16.1905963Z 2025-05-07T19:46:16.1905981Z 2025-05-07T19:46:16.1906009Z 2025-05-07T19:46:16.1906012Z 2025-05-07T19:46:16.1906016Z 2025-05-07T19:46:16.2285747Z gds-tools-1.13.0.11 | 37.9 MB | ########## | 100%  2025-05-07T19:46:16.2286727Z 2025-05-07T19:46:16.2286743Z 2025-05-07T19:46:16.2286754Z 2025-05-07T19:46:16.2286765Z 2025-05-07T19:46:16.2286776Z 2025-05-07T19:46:16.2286819Z 2025-05-07T19:46:16.2286832Z 2025-05-07T19:46:16.2286843Z 2025-05-07T19:46:16.2286853Z 2025-05-07T19:46:16.2286863Z 2025-05-07T19:46:16.2286873Z 2025-05-07T19:46:16.2286883Z 2025-05-07T19:46:16.2286893Z 2025-05-07T19:46:16.2286903Z 2025-05-07T19:46:16.2348700Z cuda-nvvm-impl-12.8. | 20.8 MB | | 0%  2025-05-07T19:46:16.2349109Z 2025-05-07T19:46:16.2349114Z 2025-05-07T19:46:16.2349118Z 2025-05-07T19:46:16.2349126Z 2025-05-07T19:46:16.2349129Z 2025-05-07T19:46:16.2349133Z 2025-05-07T19:46:16.2349136Z 2025-05-07T19:46:16.2349172Z 2025-05-07T19:46:16.2349175Z 2025-05-07T19:46:16.2349195Z 2025-05-07T19:46:16.2349199Z 2025-05-07T19:46:16.2349202Z 2025-05-07T19:46:16.2350069Z 2025-05-07T19:46:16.2594718Z cuda-nvvm-tools-12.8 | 23.5 MB | ###8 | 39%  2025-05-07T19:46:16.2595099Z 2025-05-07T19:46:16.2595103Z 2025-05-07T19:46:16.2595107Z 2025-05-07T19:46:16.2595110Z 2025-05-07T19:46:16.2595114Z 2025-05-07T19:46:16.2595118Z 2025-05-07T19:46:16.2595122Z 2025-05-07T19:46:16.2595125Z 2025-05-07T19:46:16.2595131Z 2025-05-07T19:46:16.2595134Z 2025-05-07T19:46:16.2595161Z 2025-05-07T19:46:16.2595165Z 2025-05-07T19:46:16.3287832Z cuda-nvcc-tools-12.8 | 24.5 MB | #######7 | 78%  2025-05-07T19:46:16.3288246Z 2025-05-07T19:46:16.3288251Z 2025-05-07T19:46:16.3288255Z 2025-05-07T19:46:16.3288260Z 2025-05-07T19:46:16.3288265Z 2025-05-07T19:46:16.3288270Z 2025-05-07T19:46:16.3288301Z 2025-05-07T19:46:16.3288304Z 2025-05-07T19:46:16.3288342Z 2025-05-07T19:46:16.3288346Z 2025-05-07T19:46:16.3288591Z 2025-05-07T19:46:16.3288596Z 2025-05-07T19:46:16.3288599Z 2025-05-07T19:46:16.3288603Z 2025-05-07T19:46:16.3350740Z cuda-nvvm-impl-12.8. | 20.8 MB | ##7 | 28%  2025-05-07T19:46:16.3351151Z 2025-05-07T19:46:16.3351155Z 2025-05-07T19:46:16.3351159Z 2025-05-07T19:46:16.3351163Z 2025-05-07T19:46:16.3351166Z 2025-05-07T19:46:16.3351170Z 2025-05-07T19:46:16.3351173Z 2025-05-07T19:46:16.3351176Z 2025-05-07T19:46:16.3351180Z 2025-05-07T19:46:16.3351183Z 2025-05-07T19:46:16.3351187Z 2025-05-07T19:46:16.3351190Z 2025-05-07T19:46:16.3351193Z 2025-05-07T19:46:16.3919503Z cuda-nvvm-tools-12.8 | 23.5 MB | ######3 | 64%  2025-05-07T19:46:16.3919945Z 2025-05-07T19:46:16.3919950Z 2025-05-07T19:46:16.3919953Z 2025-05-07T19:46:16.3919958Z 2025-05-07T19:46:16.3919962Z 2025-05-07T19:46:16.3919965Z 2025-05-07T19:46:16.3919969Z 2025-05-07T19:46:16.3920011Z 2025-05-07T19:46:16.3920014Z 2025-05-07T19:46:16.3920036Z 2025-05-07T19:46:16.3920040Z 2025-05-07T19:46:16.3920043Z 2025-05-07T19:46:16.4289693Z cuda-nvcc-tools-12.8 | 24.5 MB | #########5 | 96%  2025-05-07T19:46:16.4290092Z 2025-05-07T19:46:16.4290097Z 2025-05-07T19:46:16.4290101Z 2025-05-07T19:46:16.4290104Z 2025-05-07T19:46:16.4290109Z 2025-05-07T19:46:16.4290115Z 2025-05-07T19:46:16.4290120Z 2025-05-07T19:46:16.4290124Z 2025-05-07T19:46:16.4290128Z 2025-05-07T19:46:16.4290133Z 2025-05-07T19:46:16.4290138Z 2025-05-07T19:46:16.4290143Z 2025-05-07T19:46:16.4290161Z 2025-05-07T19:46:16.4290166Z 2025-05-07T19:46:16.4354033Z cuda-nvvm-impl-12.8. | 20.8 MB | #####9 | 59%  2025-05-07T19:46:16.4354420Z 2025-05-07T19:46:16.4354428Z 2025-05-07T19:46:16.4354434Z 2025-05-07T19:46:16.4354439Z 2025-05-07T19:46:16.4354443Z 2025-05-07T19:46:16.4354447Z 2025-05-07T19:46:16.4354473Z 2025-05-07T19:46:16.4354740Z 2025-05-07T19:46:16.4354744Z 2025-05-07T19:46:16.4354765Z 2025-05-07T19:46:16.4354769Z 2025-05-07T19:46:16.4354772Z 2025-05-07T19:46:16.4354775Z 2025-05-07T19:46:16.5290081Z cuda-nvvm-tools-12.8 | 23.5 MB | #########5 | 95%  2025-05-07T19:46:16.5290537Z 2025-05-07T19:46:16.5290544Z 2025-05-07T19:46:16.5290548Z 2025-05-07T19:46:16.5290552Z 2025-05-07T19:46:16.5290556Z 2025-05-07T19:46:16.5290561Z 2025-05-07T19:46:16.5290565Z 2025-05-07T19:46:16.5290570Z 2025-05-07T19:46:16.5290574Z 2025-05-07T19:46:16.5290579Z 2025-05-07T19:46:16.5290583Z 2025-05-07T19:46:16.5290587Z 2025-05-07T19:46:16.5290591Z 2025-05-07T19:46:16.5290594Z 2025-05-07T19:46:16.5506658Z cuda-nvvm-impl-12.8. | 20.8 MB | #########9 | 100%  2025-05-07T19:46:16.5507070Z 2025-05-07T19:46:16.5507075Z 2025-05-07T19:46:16.5507079Z 2025-05-07T19:46:16.5507082Z 2025-05-07T19:46:16.5507086Z 2025-05-07T19:46:16.5507090Z 2025-05-07T19:46:16.5507131Z 2025-05-07T19:46:16.5507150Z 2025-05-07T19:46:16.5507153Z 2025-05-07T19:46:16.5507157Z 2025-05-07T19:46:16.5507160Z 2025-05-07T19:46:16.5853787Z libnvjitlink-12.8.61 | 28.7 MB | ########## | 100%  2025-05-07T19:46:16.5854184Z 2025-05-07T19:46:16.5854188Z 2025-05-07T19:46:16.5854192Z 2025-05-07T19:46:16.5854196Z 2025-05-07T19:46:16.5854200Z 2025-05-07T19:46:16.5854205Z 2025-05-07T19:46:16.5854210Z 2025-05-07T19:46:16.5854214Z 2025-05-07T19:46:16.5854222Z 2025-05-07T19:46:16.5854228Z 2025-05-07T19:46:16.5854233Z 2025-05-07T19:46:16.5854237Z 2025-05-07T19:46:16.5854242Z 2025-05-07T19:46:16.5854273Z 2025-05-07T19:46:16.5854277Z 2025-05-07T19:46:16.6877189Z cuda-nvcc-dev_linux- | 12.7 MB | | 0%  2025-05-07T19:46:16.6877609Z 2025-05-07T19:46:16.6877615Z 2025-05-07T19:46:16.6877620Z 2025-05-07T19:46:16.6877624Z 2025-05-07T19:46:16.6877629Z 2025-05-07T19:46:16.6877632Z 2025-05-07T19:46:16.6877673Z 2025-05-07T19:46:16.6877913Z 2025-05-07T19:46:16.6877936Z 2025-05-07T19:46:16.6877939Z 2025-05-07T19:46:16.6877943Z 2025-05-07T19:46:16.6877947Z 2025-05-07T19:46:16.6877950Z 2025-05-07T19:46:16.6877954Z 2025-05-07T19:46:16.6877957Z 2025-05-07T19:46:16.7465223Z cuda-nvcc-dev_linux- | 12.7 MB | ####### | 71%  2025-05-07T19:46:16.7465631Z 2025-05-07T19:46:16.7465637Z 2025-05-07T19:46:16.7465645Z 2025-05-07T19:46:16.7465649Z 2025-05-07T19:46:16.7465652Z 2025-05-07T19:46:16.7465657Z 2025-05-07T19:46:16.7465661Z 2025-05-07T19:46:16.7465665Z 2025-05-07T19:46:16.7465709Z 2025-05-07T19:46:16.7465712Z 2025-05-07T19:46:16.7465716Z 2025-05-07T19:46:16.7465719Z 2025-05-07T19:46:16.7465723Z 2025-05-07T19:46:16.7529596Z cuda-nvvm-tools-12.8 | 23.5 MB | ########## | 100%  2025-05-07T19:46:16.7529977Z 2025-05-07T19:46:16.7529982Z 2025-05-07T19:46:16.7529989Z 2025-05-07T19:46:16.7529995Z 2025-05-07T19:46:16.7530031Z 2025-05-07T19:46:16.7530050Z 2025-05-07T19:46:16.7530054Z 2025-05-07T19:46:16.7530079Z 2025-05-07T19:46:16.7530082Z 2025-05-07T19:46:16.7530085Z 2025-05-07T19:46:16.7530089Z 2025-05-07T19:46:16.7530092Z 2025-05-07T19:46:16.7749714Z cuda-nvcc-tools-12.8 | 24.5 MB | ########## | 100%  2025-05-07T19:46:16.7750120Z 2025-05-07T19:46:16.7750125Z 2025-05-07T19:46:16.7750158Z 2025-05-07T19:46:16.7750162Z 2025-05-07T19:46:16.7750166Z 2025-05-07T19:46:16.7750169Z 2025-05-07T19:46:16.7750172Z 2025-05-07T19:46:16.7750176Z 2025-05-07T19:46:16.7750179Z 2025-05-07T19:46:16.7750183Z 2025-05-07T19:46:16.7750188Z 2025-05-07T19:46:16.7750191Z 2025-05-07T19:46:16.7750195Z 2025-05-07T19:46:16.7750198Z 2025-05-07T19:46:16.7824598Z cuda-nvvm-impl-12.8. | 20.8 MB | ########## | 100%  2025-05-07T19:46:16.7825012Z 2025-05-07T19:46:16.7825017Z 2025-05-07T19:46:16.7825021Z 2025-05-07T19:46:16.7825024Z 2025-05-07T19:46:16.7825273Z 2025-05-07T19:46:16.7825295Z 2025-05-07T19:46:16.7825299Z 2025-05-07T19:46:16.7825303Z 2025-05-07T19:46:16.7825307Z 2025-05-07T19:46:16.7825310Z 2025-05-07T19:46:16.7825314Z 2025-05-07T19:46:16.7825317Z 2025-05-07T19:46:16.7825320Z 2025-05-07T19:46:16.7825324Z 2025-05-07T19:46:16.7825327Z 2025-05-07T19:46:16.7825331Z 2025-05-07T19:46:16.7825334Z 2025-05-07T19:46:16.7919230Z cuda-nvdisasm-12.8.5 | 4.9 MB | | 0%  2025-05-07T19:46:16.7919619Z 2025-05-07T19:46:16.7919624Z 2025-05-07T19:46:16.7919627Z 2025-05-07T19:46:16.7919631Z 2025-05-07T19:46:16.7919635Z 2025-05-07T19:46:16.7919638Z 2025-05-07T19:46:16.7919641Z 2025-05-07T19:46:16.7919645Z 2025-05-07T19:46:16.7919648Z 2025-05-07T19:46:16.7919680Z 2025-05-07T19:46:16.7919684Z 2025-05-07T19:46:16.7919687Z 2025-05-07T19:46:16.7919691Z 2025-05-07T19:46:16.7919694Z 2025-05-07T19:46:16.7919698Z 2025-05-07T19:46:16.7919701Z 2025-05-07T19:46:16.8127082Z cuda-sanitizer-api-1 | 8.8 MB | | 0%  2025-05-07T19:46:16.8127484Z 2025-05-07T19:46:16.8127489Z 2025-05-07T19:46:16.8127493Z 2025-05-07T19:46:16.8127496Z 2025-05-07T19:46:16.8127523Z 2025-05-07T19:46:16.8127527Z 2025-05-07T19:46:16.8127530Z 2025-05-07T19:46:16.8127534Z 2025-05-07T19:46:16.8127537Z 2025-05-07T19:46:16.8127541Z 2025-05-07T19:46:16.8127545Z 2025-05-07T19:46:16.8127548Z 2025-05-07T19:46:16.8127552Z 2025-05-07T19:46:16.8127556Z 2025-05-07T19:46:16.8127559Z 2025-05-07T19:46:16.8127562Z 2025-05-07T19:46:16.8127566Z 2025-05-07T19:46:16.8127569Z 2025-05-07T19:46:16.8920087Z cuda-cupti-dev-12.8. | 4.0 MB | | 0%  2025-05-07T19:46:16.8920551Z 2025-05-07T19:46:16.8920557Z 2025-05-07T19:46:16.8920562Z 2025-05-07T19:46:16.8920567Z 2025-05-07T19:46:16.8920573Z 2025-05-07T19:46:16.8920576Z 2025-05-07T19:46:16.8920581Z 2025-05-07T19:46:16.8920623Z 2025-05-07T19:46:16.8920627Z 2025-05-07T19:46:16.8920872Z 2025-05-07T19:46:16.8920877Z 2025-05-07T19:46:16.8920881Z 2025-05-07T19:46:16.8920884Z 2025-05-07T19:46:16.8920888Z 2025-05-07T19:46:16.8920891Z 2025-05-07T19:46:16.8920895Z 2025-05-07T19:46:16.9194003Z cuda-sanitizer-api-1 | 8.8 MB | #######1 | 71%  2025-05-07T19:46:16.9194409Z 2025-05-07T19:46:16.9194413Z 2025-05-07T19:46:16.9194418Z 2025-05-07T19:46:16.9194421Z 2025-05-07T19:46:16.9194425Z 2025-05-07T19:46:16.9194428Z 2025-05-07T19:46:16.9194432Z 2025-05-07T19:46:16.9194436Z 2025-05-07T19:46:16.9194466Z 2025-05-07T19:46:16.9194470Z 2025-05-07T19:46:16.9194473Z 2025-05-07T19:46:16.9194476Z 2025-05-07T19:46:16.9194480Z 2025-05-07T19:46:16.9194483Z 2025-05-07T19:46:16.9194487Z 2025-05-07T19:46:16.9262433Z cuda-nvcc-dev_linux- | 12.7 MB | ########## | 100%  2025-05-07T19:46:16.9262844Z 2025-05-07T19:46:16.9262849Z 2025-05-07T19:46:16.9262882Z 2025-05-07T19:46:16.9262886Z 2025-05-07T19:46:16.9262904Z 2025-05-07T19:46:16.9262908Z 2025-05-07T19:46:16.9262911Z 2025-05-07T19:46:16.9262915Z 2025-05-07T19:46:16.9262918Z 2025-05-07T19:46:16.9262922Z 2025-05-07T19:46:16.9262925Z 2025-05-07T19:46:16.9262928Z 2025-05-07T19:46:16.9262932Z 2025-05-07T19:46:16.9262935Z 2025-05-07T19:46:16.9262939Z 2025-05-07T19:46:16.9262943Z 2025-05-07T19:46:16.9262983Z 2025-05-07T19:46:16.9264104Z cuda-nvdisasm-12.8.5 | 4.9 MB | ########## | 100%  2025-05-07T19:46:16.9264502Z 2025-05-07T19:46:16.9264510Z 2025-05-07T19:46:16.9264517Z 2025-05-07T19:46:16.9264523Z 2025-05-07T19:46:16.9264527Z 2025-05-07T19:46:16.9264534Z 2025-05-07T19:46:16.9264540Z 2025-05-07T19:46:16.9264546Z 2025-05-07T19:46:16.9264552Z 2025-05-07T19:46:16.9264558Z 2025-05-07T19:46:16.9264561Z 2025-05-07T19:46:16.9264565Z 2025-05-07T19:46:16.9264568Z 2025-05-07T19:46:16.9264571Z 2025-05-07T19:46:16.9264575Z 2025-05-07T19:46:16.9264845Z 2025-05-07T19:46:16.9264874Z 2025-05-07T19:46:16.9361535Z cuda-nvdisasm-12.8.5 | 4.9 MB | ########## | 100%  2025-05-07T19:46:16.9361923Z 2025-05-07T19:46:16.9361928Z 2025-05-07T19:46:16.9361932Z 2025-05-07T19:46:16.9361935Z 2025-05-07T19:46:16.9361939Z 2025-05-07T19:46:16.9361943Z 2025-05-07T19:46:16.9361946Z 2025-05-07T19:46:16.9361950Z 2025-05-07T19:46:16.9361953Z 2025-05-07T19:46:16.9361956Z 2025-05-07T19:46:16.9361960Z 2025-05-07T19:46:16.9361976Z 2025-05-07T19:46:16.9361980Z 2025-05-07T19:46:16.9361983Z 2025-05-07T19:46:16.9361987Z 2025-05-07T19:46:16.9361990Z 2025-05-07T19:46:16.9361994Z 2025-05-07T19:46:16.9361997Z 2025-05-07T19:46:16.9362338Z cuda-cupti-dev-12.8. | 4.0 MB | ########## | 100%  2025-05-07T19:46:16.9362686Z 2025-05-07T19:46:16.9362690Z 2025-05-07T19:46:16.9362706Z 2025-05-07T19:46:16.9362709Z 2025-05-07T19:46:16.9362713Z 2025-05-07T19:46:16.9362730Z 2025-05-07T19:46:16.9362742Z 2025-05-07T19:46:16.9362746Z 2025-05-07T19:46:16.9362749Z 2025-05-07T19:46:16.9362753Z 2025-05-07T19:46:16.9362756Z 2025-05-07T19:46:16.9362759Z 2025-05-07T19:46:16.9362763Z 2025-05-07T19:46:16.9362766Z 2025-05-07T19:46:16.9362769Z 2025-05-07T19:46:16.9362773Z 2025-05-07T19:46:16.9362776Z 2025-05-07T19:46:16.9362779Z 2025-05-07T19:46:16.9548868Z cuda-cupti-dev-12.8. | 4.0 MB | ########## | 100%  2025-05-07T19:46:16.9549264Z 2025-05-07T19:46:16.9549269Z 2025-05-07T19:46:16.9549272Z 2025-05-07T19:46:16.9549276Z 2025-05-07T19:46:16.9549279Z 2025-05-07T19:46:16.9549282Z 2025-05-07T19:46:16.9549286Z 2025-05-07T19:46:16.9549289Z 2025-05-07T19:46:16.9549293Z 2025-05-07T19:46:16.9549296Z 2025-05-07T19:46:16.9549300Z 2025-05-07T19:46:16.9549303Z 2025-05-07T19:46:16.9549307Z 2025-05-07T19:46:16.9549310Z 2025-05-07T19:46:16.9549327Z 2025-05-07T19:46:16.9549331Z 2025-05-07T19:46:16.9549347Z 2025-05-07T19:46:16.9549351Z 2025-05-07T19:46:16.9549557Z 2025-05-07T19:46:17.0147145Z ... (more hidden) ... 2025-05-07T19:46:17.0147478Z 2025-05-07T19:46:17.0147483Z 2025-05-07T19:46:17.0147487Z 2025-05-07T19:46:17.0147507Z 2025-05-07T19:46:17.0147512Z 2025-05-07T19:46:17.0147516Z 2025-05-07T19:46:17.0147521Z 2025-05-07T19:46:17.0147526Z 2025-05-07T19:46:17.0147530Z 2025-05-07T19:46:17.0147534Z 2025-05-07T19:46:17.0147538Z 2025-05-07T19:46:17.0147543Z 2025-05-07T19:46:17.0147547Z 2025-05-07T19:46:17.0147552Z 2025-05-07T19:46:17.0147556Z 2025-05-07T19:46:17.0147559Z 2025-05-07T19:46:17.0242945Z cuda-sanitizer-api-1 | 8.8 MB | ########## | 100%  2025-05-07T19:46:17.0243366Z 2025-05-07T19:46:17.0243370Z 2025-05-07T19:46:17.0243374Z 2025-05-07T19:46:17.0243378Z 2025-05-07T19:46:17.0243382Z 2025-05-07T19:46:17.0243385Z 2025-05-07T19:46:17.0243388Z 2025-05-07T19:46:17.0243412Z 2025-05-07T19:46:17.0243416Z 2025-05-07T19:46:17.0243435Z 2025-05-07T19:46:17.0243438Z 2025-05-07T19:46:17.0243442Z 2025-05-07T19:46:17.0243445Z 2025-05-07T19:46:17.0243448Z 2025-05-07T19:46:17.0243452Z 2025-05-07T19:46:17.0243455Z 2025-05-07T19:46:17.0243459Z 2025-05-07T19:46:17.0243477Z 2025-05-07T19:46:17.0243481Z 2025-05-07T19:46:17.5076734Z ... (more hidden) ... 2025-05-07T19:46:17.5077460Z 2025-05-07T19:46:17.5077466Z 2025-05-07T19:46:17.5077469Z 2025-05-07T19:46:17.5077493Z 2025-05-07T19:46:17.5077497Z 2025-05-07T19:46:17.5077500Z 2025-05-07T19:46:17.5077505Z 2025-05-07T19:46:17.8859569Z cuda-nvvp-12.8.57 | 112.4 MB | ########## | 100%  2025-05-07T19:46:17.8932208Z libcublas-12.8.3.14 | 460.2 MB | ########## | 100% 2025-05-07T19:46:17.8933042Z 2025-05-07T19:46:17.8933056Z 2025-05-07T19:46:17.8933068Z 2025-05-07T19:46:17.8933079Z 2025-05-07T19:46:17.8933092Z 2025-05-07T19:46:17.8933102Z 2025-05-07T19:46:17.8933563Z 2025-05-07T19:46:17.8933592Z 2025-05-07T19:46:17.8933602Z 2025-05-07T19:46:17.9023882Z libcurand-10.3.9.55 | 43.6 MB | ########## | 100%  2025-05-07T19:46:17.9024845Z 2025-05-07T19:46:17.9024858Z 2025-05-07T19:46:17.9024869Z 2025-05-07T19:46:17.9024880Z 2025-05-07T19:46:17.9024891Z 2025-05-07T19:46:18.0419550Z libnpp-12.3.3.65 | 130.6 MB | ########## | 100%  2025-05-07T19:46:18.0419872Z 2025-05-07T19:46:18.0419877Z 2025-05-07T19:46:18.0419881Z 2025-05-07T19:46:18.0419885Z 2025-05-07T19:46:18.0419889Z 2025-05-07T19:46:18.0419892Z 2025-05-07T19:46:18.0419896Z 2025-05-07T19:46:18.0419900Z 2025-05-07T19:46:18.0419903Z 2025-05-07T19:46:18.0419907Z 2025-05-07T19:46:18.3432792Z gds-tools-1.13.0.11 | 37.9 MB | ########## | 100%  2025-05-07T19:46:18.3433246Z 2025-05-07T19:46:18.3433263Z 2025-05-07T19:46:18.3433452Z 2025-05-07T19:46:18.3433456Z 2025-05-07T19:46:18.3433459Z 2025-05-07T19:46:18.3433484Z 2025-05-07T19:46:18.3433489Z 2025-05-07T19:46:18.3433633Z 2025-05-07T19:46:18.3433636Z 2025-05-07T19:46:18.3433639Z 2025-05-07T19:46:18.3433642Z 2025-05-07T19:46:18.5833115Z libnvjitlink-12.8.61 | 28.7 MB | ########## | 100%  2025-05-07T19:46:18.5833636Z 2025-05-07T19:46:18.5833677Z 2025-05-07T19:46:18.5833702Z 2025-05-07T19:46:18.5833706Z 2025-05-07T19:46:18.5833709Z 2025-05-07T19:46:18.5834069Z 2025-05-07T19:46:18.5834090Z 2025-05-07T19:46:18.5834100Z 2025-05-07T19:46:18.6123473Z cuda-nvrtc-12.8.61 | 63.1 MB | ########## | 100%  2025-05-07T19:46:18.6123902Z 2025-05-07T19:46:18.6123909Z 2025-05-07T19:46:18.6123917Z 2025-05-07T19:46:18.6123924Z 2025-05-07T19:46:18.6123929Z 2025-05-07T19:46:18.6123936Z 2025-05-07T19:46:18.6123941Z 2025-05-07T19:46:18.6123946Z 2025-05-07T19:46:18.6123979Z 2025-05-07T19:46:18.6123983Z 2025-05-07T19:46:18.6123988Z 2025-05-07T19:46:18.6123991Z 2025-05-07T19:46:18.6123995Z 2025-05-07T19:46:18.8544769Z cuda-nvvm-tools-12.8 | 23.5 MB | ########## | 100%  2025-05-07T19:46:18.8545904Z 2025-05-07T19:46:18.8545919Z 2025-05-07T19:46:18.8545931Z 2025-05-07T19:46:18.8545943Z 2025-05-07T19:46:18.8545953Z 2025-05-07T19:46:18.8545964Z 2025-05-07T19:46:18.8545974Z 2025-05-07T19:46:18.8545984Z 2025-05-07T19:46:18.8545994Z 2025-05-07T19:46:18.8546004Z 2025-05-07T19:46:18.8546015Z 2025-05-07T19:46:18.8546026Z 2025-05-07T19:46:18.8546036Z 2025-05-07T19:46:18.8546046Z 2025-05-07T19:46:18.8776148Z cuda-nvvm-impl-12.8. | 20.8 MB | ########## | 100%  2025-05-07T19:46:18.8777518Z 2025-05-07T19:46:18.8777532Z 2025-05-07T19:46:18.8777543Z 2025-05-07T19:46:18.8777554Z 2025-05-07T19:46:18.8777565Z 2025-05-07T19:46:18.8777575Z 2025-05-07T19:46:18.8777586Z 2025-05-07T19:46:18.8777596Z 2025-05-07T19:46:18.8777607Z 2025-05-07T19:46:18.8777617Z 2025-05-07T19:46:18.8777627Z 2025-05-07T19:46:18.8777638Z 2025-05-07T19:46:18.9019214Z cuda-nvcc-tools-12.8 | 24.5 MB | ########## | 100%  2025-05-07T19:46:18.9019627Z 2025-05-07T19:46:18.9019632Z 2025-05-07T19:46:18.9019636Z 2025-05-07T19:46:18.9019639Z 2025-05-07T19:46:18.9019643Z 2025-05-07T19:46:18.9019647Z 2025-05-07T19:46:18.9019650Z 2025-05-07T19:46:18.9019654Z 2025-05-07T19:46:18.9019657Z 2025-05-07T19:46:18.9019661Z 2025-05-07T19:46:18.9019664Z 2025-05-07T19:46:18.9019668Z 2025-05-07T19:46:18.9019671Z 2025-05-07T19:46:18.9019674Z 2025-05-07T19:46:18.9019678Z 2025-05-07T19:46:18.9019682Z 2025-05-07T19:46:18.9019685Z 2025-05-07T19:46:19.0061697Z cuda-nvdisasm-12.8.5 | 4.9 MB | ########## | 100%  2025-05-07T19:46:19.0062813Z 2025-05-07T19:46:19.0062826Z 2025-05-07T19:46:19.0062836Z 2025-05-07T19:46:19.0062848Z 2025-05-07T19:46:19.0062859Z 2025-05-07T19:46:19.0062870Z 2025-05-07T19:46:19.0062881Z 2025-05-07T19:46:19.0062892Z 2025-05-07T19:46:19.0063343Z 2025-05-07T19:46:19.0063354Z 2025-05-07T19:46:19.0063381Z 2025-05-07T19:46:19.0063392Z 2025-05-07T19:46:19.0063402Z 2025-05-07T19:46:19.0063437Z 2025-05-07T19:46:19.0063447Z 2025-05-07T19:46:19.0063458Z 2025-05-07T19:46:19.0063468Z 2025-05-07T19:46:19.0063479Z 2025-05-07T19:46:19.0822696Z cuda-cupti-dev-12.8. | 4.0 MB | ########## | 100%  2025-05-07T19:46:19.0823827Z 2025-05-07T19:46:19.0823840Z 2025-05-07T19:46:19.0823877Z 2025-05-07T19:46:19.0823888Z 2025-05-07T19:46:19.0823899Z 2025-05-07T19:46:19.0823910Z 2025-05-07T19:46:19.0823920Z 2025-05-07T19:46:19.0823931Z 2025-05-07T19:46:19.0823941Z 2025-05-07T19:46:19.0823951Z 2025-05-07T19:46:19.0823962Z 2025-05-07T19:46:19.0823972Z 2025-05-07T19:46:19.0823982Z 2025-05-07T19:46:19.0823992Z 2025-05-07T19:46:19.0824002Z 2025-05-07T19:46:19.1460723Z cuda-nvcc-dev_linux- | 12.7 MB | ########## | 100%  2025-05-07T19:46:19.1461141Z 2025-05-07T19:46:19.1461280Z 2025-05-07T19:46:19.1461291Z 2025-05-07T19:46:19.1461294Z 2025-05-07T19:46:19.1461298Z 2025-05-07T19:46:19.1461302Z 2025-05-07T19:46:19.1461305Z 2025-05-07T19:46:19.1461309Z 2025-05-07T19:46:19.1461312Z 2025-05-07T19:46:19.1461316Z 2025-05-07T19:46:19.1461319Z 2025-05-07T19:46:19.1461323Z 2025-05-07T19:46:19.1461326Z 2025-05-07T19:46:19.1461329Z 2025-05-07T19:46:19.1461333Z 2025-05-07T19:46:19.1461336Z 2025-05-07T19:46:19.1636282Z cuda-sanitizer-api-1 | 8.8 MB | ########## | 100%  2025-05-07T19:46:19.1637438Z 2025-05-07T19:46:19.1637452Z 2025-05-07T19:46:19.1637463Z 2025-05-07T19:46:19.1637473Z 2025-05-07T19:46:19.1637483Z 2025-05-07T19:46:19.1637494Z 2025-05-07T19:46:19.1637533Z 2025-05-07T19:46:19.1637544Z 2025-05-07T19:46:19.1637554Z 2025-05-07T19:46:19.1637564Z 2025-05-07T19:46:19.1637574Z 2025-05-07T19:46:19.1637584Z 2025-05-07T19:46:19.1637594Z 2025-05-07T19:46:19.1637604Z 2025-05-07T19:46:19.1637648Z 2025-05-07T19:46:19.1637658Z 2025-05-07T19:46:19.1638131Z 2025-05-07T19:46:19.1638272Z 2025-05-07T19:46:19.1638276Z 2025-05-07T19:46:19.1638547Z ... (more hidden) ... 2025-05-07T19:46:19.1638871Z 2025-05-07T19:46:19.1638875Z 2025-05-07T19:46:19.1638878Z 2025-05-07T19:46:19.1638881Z 2025-05-07T19:46:19.1638885Z 2025-05-07T19:46:19.1638888Z 2025-05-07T19:46:19.1638891Z 2025-05-07T19:46:19.1638894Z 2025-05-07T19:46:19.1638897Z 2025-05-07T19:46:19.1638900Z 2025-05-07T19:46:19.1638903Z 2025-05-07T19:46:19.1638907Z 2025-05-07T19:46:19.1638910Z 2025-05-07T19:46:19.1638913Z 2025-05-07T19:46:19.1638916Z 2025-05-07T19:46:19.1638919Z 2025-05-07T19:46:19.1638922Z 2025-05-07T19:46:19.1638925Z 2025-05-07T19:46:19.1638929Z 2025-05-07T19:46:21.8403410Z ... (more hidden) ... 2025-05-07T19:46:21.8404395Z 2025-05-07T19:46:22.7299427Z nsight-compute-2025. | 320.6 MB | ########## | 100%  2025-05-07T19:46:22.7304256Z libcublas-12.8.3.14 | 460.2 MB | ########## | 100% 2025-05-07T19:46:22.7304527Z 2025-05-07T19:46:22.7304532Z 2025-05-07T19:46:22.7304536Z 2025-05-07T19:46:22.7304540Z 2025-05-07T19:46:22.7304543Z 2025-05-07T19:46:22.7304547Z 2025-05-07T19:46:22.7304550Z 2025-05-07T19:46:22.7304553Z 2025-05-07T19:46:22.7304620Z 2025-05-07T19:46:22.7304667Z 2025-05-07T19:46:22.7304708Z 2025-05-07T19:46:22.7304731Z 2025-05-07T19:46:22.7304745Z 2025-05-07T19:46:22.7304768Z 2025-05-07T19:46:22.7304774Z 2025-05-07T19:46:22.7304780Z 2025-05-07T19:46:22.7304783Z 2025-05-07T19:46:22.7304787Z 2025-05-07T19:46:22.7304801Z 2025-05-07T19:46:22.7304911Z 2025-05-07T19:46:22.7306490Z  2025-05-07T19:46:22.7307548Z 2025-05-07T19:46:22.7308182Z 2025-05-07T19:46:22.7308701Z  2025-05-07T19:46:22.7309325Z 2025-05-07T19:46:22.7309830Z 2025-05-07T19:46:22.7310408Z  2025-05-07T19:46:22.7310645Z 2025-05-07T19:46:22.7310650Z 2025-05-07T19:46:22.7310653Z 2025-05-07T19:46:22.7310834Z  2025-05-07T19:46:22.7311094Z 2025-05-07T19:46:22.7311098Z 2025-05-07T19:46:22.7311102Z 2025-05-07T19:46:22.7311105Z 2025-05-07T19:46:22.7311283Z  2025-05-07T19:46:22.7311515Z 2025-05-07T19:46:22.7311518Z 2025-05-07T19:46:22.7311522Z 2025-05-07T19:46:22.7311544Z 2025-05-07T19:46:22.7311547Z 2025-05-07T19:46:22.7311727Z  2025-05-07T19:46:22.7311965Z 2025-05-07T19:46:22.7311969Z 2025-05-07T19:46:22.7311973Z 2025-05-07T19:46:22.7311977Z 2025-05-07T19:46:22.7311981Z 2025-05-07T19:46:22.7311984Z 2025-05-07T19:46:22.7312196Z  2025-05-07T19:46:22.7312433Z 2025-05-07T19:46:22.7312446Z 2025-05-07T19:46:22.7312450Z 2025-05-07T19:46:22.7312453Z 2025-05-07T19:46:22.7312457Z 2025-05-07T19:46:22.7312460Z 2025-05-07T19:46:22.7312464Z 2025-05-07T19:46:22.7312665Z  2025-05-07T19:46:22.7312901Z 2025-05-07T19:46:22.7312905Z 2025-05-07T19:46:22.7312909Z 2025-05-07T19:46:22.7312912Z 2025-05-07T19:46:22.7312916Z 2025-05-07T19:46:22.7312919Z 2025-05-07T19:46:22.7312923Z 2025-05-07T19:46:22.7312927Z 2025-05-07T19:46:22.7313115Z  2025-05-07T19:46:22.7313478Z 2025-05-07T19:46:22.7313482Z 2025-05-07T19:46:22.7313485Z 2025-05-07T19:46:22.7313488Z 2025-05-07T19:46:22.7313492Z 2025-05-07T19:46:22.7313495Z 2025-05-07T19:46:22.7313498Z 2025-05-07T19:46:22.7313502Z 2025-05-07T19:46:22.7313505Z 2025-05-07T19:46:22.7313697Z  2025-05-07T19:46:22.7314113Z 2025-05-07T19:46:22.7314117Z 2025-05-07T19:46:22.7314121Z 2025-05-07T19:46:22.7314124Z 2025-05-07T19:46:22.7314128Z 2025-05-07T19:46:22.7314131Z 2025-05-07T19:46:22.7314135Z 2025-05-07T19:46:22.7314138Z 2025-05-07T19:46:22.7314142Z 2025-05-07T19:46:22.7314146Z 2025-05-07T19:46:22.7314344Z  2025-05-07T19:46:22.7314600Z 2025-05-07T19:46:22.7314603Z 2025-05-07T19:46:22.7314607Z 2025-05-07T19:46:22.7314610Z 2025-05-07T19:46:22.7314613Z 2025-05-07T19:46:22.7314617Z 2025-05-07T19:46:22.7314620Z 2025-05-07T19:46:22.7314624Z 2025-05-07T19:46:22.7314627Z 2025-05-07T19:46:22.7314631Z 2025-05-07T19:46:22.7314634Z 2025-05-07T19:46:22.7314832Z  2025-05-07T19:46:22.7315102Z 2025-05-07T19:46:22.7315105Z 2025-05-07T19:46:22.7315109Z 2025-05-07T19:46:22.7315191Z 2025-05-07T19:46:22.7315221Z 2025-05-07T19:46:22.7315228Z 2025-05-07T19:46:22.7315232Z 2025-05-07T19:46:22.7315235Z 2025-05-07T19:46:22.7315238Z 2025-05-07T19:46:22.7315241Z 2025-05-07T19:46:22.7315245Z 2025-05-07T19:46:22.7315248Z 2025-05-07T19:46:22.7315450Z  2025-05-07T19:46:22.7315701Z 2025-05-07T19:46:22.7315705Z 2025-05-07T19:46:22.7315729Z 2025-05-07T19:46:22.7315733Z 2025-05-07T19:46:22.7315736Z 2025-05-07T19:46:22.7315740Z 2025-05-07T19:46:22.7315743Z 2025-05-07T19:46:22.7315746Z 2025-05-07T19:46:22.7315750Z 2025-05-07T19:46:22.7315753Z 2025-05-07T19:46:22.7315756Z 2025-05-07T19:46:22.7315760Z 2025-05-07T19:46:22.7315763Z 2025-05-07T19:46:22.7315969Z  2025-05-07T19:46:22.7316239Z 2025-05-07T19:46:22.7316243Z 2025-05-07T19:46:22.7316246Z 2025-05-07T19:46:22.7316250Z 2025-05-07T19:46:22.7316253Z 2025-05-07T19:46:22.7316338Z 2025-05-07T19:46:22.7316341Z 2025-05-07T19:46:22.7316348Z 2025-05-07T19:46:22.7316351Z 2025-05-07T19:46:22.7316355Z 2025-05-07T19:46:22.7316358Z 2025-05-07T19:46:22.7316361Z 2025-05-07T19:46:22.7316365Z 2025-05-07T19:46:22.7316368Z 2025-05-07T19:46:22.7316767Z  2025-05-07T19:46:22.7317041Z 2025-05-07T19:46:22.7317045Z 2025-05-07T19:46:22.7317049Z 2025-05-07T19:46:22.7317053Z 2025-05-07T19:46:22.7317056Z 2025-05-07T19:46:22.7317060Z 2025-05-07T19:46:22.7317063Z 2025-05-07T19:46:22.7317067Z 2025-05-07T19:46:22.7317070Z 2025-05-07T19:46:22.7317074Z 2025-05-07T19:46:22.7317077Z 2025-05-07T19:46:22.7317081Z 2025-05-07T19:46:22.7317084Z 2025-05-07T19:46:22.7317087Z 2025-05-07T19:46:22.7317091Z 2025-05-07T19:46:22.7317336Z  2025-05-07T19:46:22.7317592Z 2025-05-07T19:46:22.7317597Z 2025-05-07T19:46:22.7317604Z 2025-05-07T19:46:22.7317611Z 2025-05-07T19:46:22.7317615Z 2025-05-07T19:46:22.7317618Z 2025-05-07T19:46:22.7317622Z 2025-05-07T19:46:22.7317625Z 2025-05-07T19:46:22.7317629Z 2025-05-07T19:46:22.7317632Z 2025-05-07T19:46:22.7317635Z 2025-05-07T19:46:22.7317639Z 2025-05-07T19:46:22.7317643Z 2025-05-07T19:46:22.7317666Z 2025-05-07T19:46:22.7317669Z 2025-05-07T19:46:22.7317673Z 2025-05-07T19:46:22.7317903Z  2025-05-07T19:46:22.7318165Z 2025-05-07T19:46:22.7318168Z 2025-05-07T19:46:22.7318172Z 2025-05-07T19:46:22.7318175Z 2025-05-07T19:46:22.7318179Z 2025-05-07T19:46:22.7318182Z 2025-05-07T19:46:22.7318186Z 2025-05-07T19:46:22.7318208Z 2025-05-07T19:46:22.7318212Z 2025-05-07T19:46:22.7318215Z 2025-05-07T19:46:22.7318219Z 2025-05-07T19:46:22.7318222Z 2025-05-07T19:46:22.7318225Z 2025-05-07T19:46:22.7318229Z 2025-05-07T19:46:22.7318232Z 2025-05-07T19:46:22.7318239Z 2025-05-07T19:46:22.7318242Z 2025-05-07T19:46:22.7318560Z  2025-05-07T19:46:22.7318838Z 2025-05-07T19:46:22.7318842Z 2025-05-07T19:46:22.7318845Z 2025-05-07T19:46:22.7318849Z 2025-05-07T19:46:22.7318852Z 2025-05-07T19:46:22.7318856Z 2025-05-07T19:46:22.7318859Z 2025-05-07T19:46:22.7318863Z 2025-05-07T19:46:22.7318866Z 2025-05-07T19:46:22.7318869Z 2025-05-07T19:46:22.7318873Z 2025-05-07T19:46:22.7318876Z 2025-05-07T19:46:22.7318880Z 2025-05-07T19:46:22.7318883Z 2025-05-07T19:46:22.7318886Z 2025-05-07T19:46:22.7318890Z 2025-05-07T19:46:22.7318893Z 2025-05-07T19:46:22.7318896Z 2025-05-07T19:46:22.7319154Z  2025-05-07T19:46:22.7319429Z 2025-05-07T19:46:22.7319433Z 2025-05-07T19:46:22.7319538Z  2025-05-07T19:46:22.7319668Z 2025-05-07T19:46:22.7319672Z 2025-05-07T19:46:22.7319778Z  2025-05-07T19:46:22.7319894Z 2025-05-07T19:46:22.7319901Z 2025-05-07T19:46:22.7319905Z 2025-05-07T19:46:22.7320013Z  2025-05-07T19:46:22.7320149Z 2025-05-07T19:46:22.7320153Z 2025-05-07T19:46:22.7320157Z 2025-05-07T19:46:22.7320161Z 2025-05-07T19:46:22.7320269Z  2025-05-07T19:46:22.7320397Z 2025-05-07T19:46:22.7320417Z 2025-05-07T19:46:22.7320423Z 2025-05-07T19:46:22.7320426Z 2025-05-07T19:46:22.7320430Z 2025-05-07T19:46:22.7320544Z  2025-05-07T19:46:22.7320673Z 2025-05-07T19:46:22.7320677Z 2025-05-07T19:46:22.7320681Z 2025-05-07T19:46:22.7320684Z 2025-05-07T19:46:22.7320688Z 2025-05-07T19:46:22.7320692Z 2025-05-07T19:46:22.7320823Z  2025-05-07T19:46:22.7320958Z 2025-05-07T19:46:22.7320962Z 2025-05-07T19:46:22.7320966Z 2025-05-07T19:46:22.7320969Z 2025-05-07T19:46:22.7320972Z 2025-05-07T19:46:22.7320978Z 2025-05-07T19:46:22.7320981Z 2025-05-07T19:46:22.7321118Z  2025-05-07T19:46:22.7321264Z 2025-05-07T19:46:22.7321337Z 2025-05-07T19:46:22.7321344Z 2025-05-07T19:46:22.7321347Z 2025-05-07T19:46:22.7321351Z 2025-05-07T19:46:22.7321354Z 2025-05-07T19:46:22.7321358Z 2025-05-07T19:46:22.7321361Z 2025-05-07T19:46:22.7321489Z  2025-05-07T19:46:22.7321670Z 2025-05-07T19:46:22.7321673Z 2025-05-07T19:46:22.7321677Z 2025-05-07T19:46:22.7321681Z 2025-05-07T19:46:22.7321687Z 2025-05-07T19:46:22.7321690Z 2025-05-07T19:46:22.7321693Z 2025-05-07T19:46:22.7321697Z 2025-05-07T19:46:22.7321700Z 2025-05-07T19:46:22.7321831Z  2025-05-07T19:46:22.7322042Z 2025-05-07T19:46:22.7322046Z 2025-05-07T19:46:22.7322049Z 2025-05-07T19:46:22.7322053Z 2025-05-07T19:46:22.7322057Z 2025-05-07T19:46:22.7322061Z 2025-05-07T19:46:22.7322064Z 2025-05-07T19:46:22.7322068Z 2025-05-07T19:46:22.7322071Z 2025-05-07T19:46:22.7322074Z 2025-05-07T19:46:22.7322210Z  2025-05-07T19:46:22.7322403Z 2025-05-07T19:46:22.7322407Z 2025-05-07T19:46:22.7322415Z 2025-05-07T19:46:22.7322418Z 2025-05-07T19:46:22.7322425Z 2025-05-07T19:46:22.7322429Z 2025-05-07T19:46:22.7322433Z 2025-05-07T19:46:22.7322436Z 2025-05-07T19:46:22.7322439Z 2025-05-07T19:46:22.7322443Z 2025-05-07T19:46:22.7322446Z 2025-05-07T19:46:22.7322581Z  2025-05-07T19:46:22.7322785Z 2025-05-07T19:46:22.7322788Z 2025-05-07T19:46:22.7322792Z 2025-05-07T19:46:22.7322797Z 2025-05-07T19:46:22.7322801Z 2025-05-07T19:46:22.7322805Z 2025-05-07T19:46:22.7322808Z 2025-05-07T19:46:22.7322812Z 2025-05-07T19:46:22.7322815Z 2025-05-07T19:46:22.7322819Z 2025-05-07T19:46:22.7322822Z 2025-05-07T19:46:22.7322826Z 2025-05-07T19:46:22.7323307Z  2025-05-07T19:46:22.7323519Z 2025-05-07T19:46:22.7323523Z 2025-05-07T19:46:22.7323526Z 2025-05-07T19:46:22.7323530Z 2025-05-07T19:46:22.7323533Z 2025-05-07T19:46:22.7323536Z 2025-05-07T19:46:22.7323540Z 2025-05-07T19:46:22.7323543Z 2025-05-07T19:46:22.7323546Z 2025-05-07T19:46:22.7323554Z 2025-05-07T19:46:22.7323557Z 2025-05-07T19:46:22.7323631Z 2025-05-07T19:46:22.7323635Z 2025-05-07T19:46:22.7323798Z  2025-05-07T19:46:22.7323999Z 2025-05-07T19:46:22.7324003Z 2025-05-07T19:46:22.7324006Z 2025-05-07T19:46:22.7324010Z 2025-05-07T19:46:22.7324013Z 2025-05-07T19:46:22.7324016Z 2025-05-07T19:46:22.7324020Z 2025-05-07T19:46:22.7324023Z 2025-05-07T19:46:22.7324026Z 2025-05-07T19:46:22.7324029Z 2025-05-07T19:46:22.7324033Z 2025-05-07T19:46:22.7324036Z 2025-05-07T19:46:22.7324040Z 2025-05-07T19:46:22.7324043Z 2025-05-07T19:46:22.7324205Z  2025-05-07T19:46:22.7324405Z 2025-05-07T19:46:22.7324409Z 2025-05-07T19:46:22.7324412Z 2025-05-07T19:46:22.7324415Z 2025-05-07T19:46:22.7324419Z 2025-05-07T19:46:22.7324422Z 2025-05-07T19:46:22.7324425Z 2025-05-07T19:46:22.7324429Z 2025-05-07T19:46:22.7324432Z 2025-05-07T19:46:22.7324436Z 2025-05-07T19:46:22.7324439Z 2025-05-07T19:46:22.7324447Z 2025-05-07T19:46:22.7324451Z 2025-05-07T19:46:22.7324482Z 2025-05-07T19:46:22.7324485Z 2025-05-07T19:46:22.7324634Z  2025-05-07T19:46:22.7324841Z 2025-05-07T19:46:22.7324845Z 2025-05-07T19:46:22.7324848Z 2025-05-07T19:46:22.7324852Z 2025-05-07T19:46:22.7324855Z 2025-05-07T19:46:22.7324858Z 2025-05-07T19:46:22.7324862Z 2025-05-07T19:46:22.7324865Z 2025-05-07T19:46:22.7324869Z 2025-05-07T19:46:22.7324888Z 2025-05-07T19:46:22.7324891Z 2025-05-07T19:46:22.7324894Z 2025-05-07T19:46:22.7324898Z 2025-05-07T19:46:22.7324901Z 2025-05-07T19:46:22.7324904Z 2025-05-07T19:46:22.7324907Z 2025-05-07T19:46:22.7325073Z  2025-05-07T19:46:22.7325288Z 2025-05-07T19:46:22.7325292Z 2025-05-07T19:46:22.7325295Z 2025-05-07T19:46:22.7325316Z 2025-05-07T19:46:22.7325319Z 2025-05-07T19:46:22.7325323Z 2025-05-07T19:46:22.7325326Z 2025-05-07T19:46:22.7325329Z 2025-05-07T19:46:22.7325332Z 2025-05-07T19:46:22.7325431Z 2025-05-07T19:46:22.7325434Z 2025-05-07T19:46:22.7325441Z 2025-05-07T19:46:22.7325444Z 2025-05-07T19:46:22.7325447Z 2025-05-07T19:46:22.7325450Z 2025-05-07T19:46:22.7325453Z 2025-05-07T19:46:22.7325457Z 2025-05-07T19:46:22.7325618Z  2025-05-07T19:46:22.7325856Z 2025-05-07T19:46:22.7325859Z 2025-05-07T19:46:22.7325864Z 2025-05-07T19:46:22.7325867Z 2025-05-07T19:46:22.7325870Z 2025-05-07T19:46:22.7325874Z 2025-05-07T19:46:22.7325877Z 2025-05-07T19:46:22.7325880Z 2025-05-07T19:46:22.7325884Z 2025-05-07T19:46:22.7325887Z 2025-05-07T19:46:22.7325891Z 2025-05-07T19:46:22.7325894Z 2025-05-07T19:46:22.7325897Z 2025-05-07T19:46:22.7325901Z 2025-05-07T19:46:22.7325904Z 2025-05-07T19:46:22.7325907Z 2025-05-07T19:46:22.7325911Z 2025-05-07T19:46:22.7325914Z 2025-05-07T19:46:22.7326103Z  2025-05-07T19:46:22.7326323Z 2025-05-07T19:46:22.7326327Z 2025-05-07T19:46:22.7326427Z  2025-05-07T19:46:22.7326564Z 2025-05-07T19:46:22.7326568Z 2025-05-07T19:46:22.7326672Z  2025-05-07T19:46:22.7326784Z 2025-05-07T19:46:22.7326788Z 2025-05-07T19:46:22.7326791Z 2025-05-07T19:46:22.7326908Z  2025-05-07T19:46:22.7327021Z 2025-05-07T19:46:22.7327024Z 2025-05-07T19:46:22.7327028Z 2025-05-07T19:46:22.7327031Z 2025-05-07T19:46:22.7327136Z  2025-05-07T19:46:22.7327271Z 2025-05-07T19:46:22.7327274Z 2025-05-07T19:46:22.7327277Z 2025-05-07T19:46:22.7327281Z 2025-05-07T19:46:22.7327285Z 2025-05-07T19:46:22.7327391Z  2025-05-07T19:46:22.7327516Z 2025-05-07T19:46:22.7327520Z 2025-05-07T19:46:22.7327523Z 2025-05-07T19:46:22.7327526Z 2025-05-07T19:46:22.7327530Z 2025-05-07T19:46:22.7327553Z 2025-05-07T19:46:22.7327666Z  2025-05-07T19:46:22.7327803Z 2025-05-07T19:46:22.7327806Z 2025-05-07T19:46:22.7327810Z 2025-05-07T19:46:22.7327814Z 2025-05-07T19:46:22.7327817Z 2025-05-07T19:46:22.7327820Z 2025-05-07T19:46:22.7327823Z 2025-05-07T19:46:22.7327963Z  2025-05-07T19:46:22.7328170Z 2025-05-07T19:46:22.7328175Z 2025-05-07T19:46:22.7328178Z 2025-05-07T19:46:22.7328181Z 2025-05-07T19:46:22.7328185Z 2025-05-07T19:46:22.7328188Z 2025-05-07T19:46:22.7328191Z 2025-05-07T19:46:22.7328195Z 2025-05-07T19:46:22.7328317Z  2025-05-07T19:46:22.7328484Z 2025-05-07T19:46:22.7328487Z 2025-05-07T19:46:22.7328491Z 2025-05-07T19:46:22.7328494Z 2025-05-07T19:46:22.7328497Z 2025-05-07T19:46:22.7328501Z 2025-05-07T19:46:22.7328504Z 2025-05-07T19:46:22.7328507Z 2025-05-07T19:46:22.7328511Z 2025-05-07T19:46:22.7328634Z  2025-05-07T19:46:22.7328812Z 2025-05-07T19:46:22.7328816Z 2025-05-07T19:46:22.7328819Z 2025-05-07T19:46:22.7328822Z 2025-05-07T19:46:22.7328825Z 2025-05-07T19:46:22.7328829Z 2025-05-07T19:46:22.7328832Z 2025-05-07T19:46:22.7328835Z 2025-05-07T19:46:22.7328838Z 2025-05-07T19:46:22.7328842Z 2025-05-07T19:46:22.7328975Z  2025-05-07T19:46:22.7329177Z 2025-05-07T19:46:22.7329180Z 2025-05-07T19:46:22.7329187Z 2025-05-07T19:46:22.7329191Z 2025-05-07T19:46:22.7329194Z 2025-05-07T19:46:22.7329197Z 2025-05-07T19:46:22.7329201Z 2025-05-07T19:46:22.7329204Z 2025-05-07T19:46:22.7329207Z 2025-05-07T19:46:22.7329210Z 2025-05-07T19:46:22.7329213Z 2025-05-07T19:46:22.7329344Z  2025-05-07T19:46:22.7329541Z 2025-05-07T19:46:22.7329544Z 2025-05-07T19:46:22.7329548Z 2025-05-07T19:46:22.7329551Z 2025-05-07T19:46:22.7329554Z 2025-05-07T19:46:22.7329558Z 2025-05-07T19:46:22.7329561Z 2025-05-07T19:46:22.7329564Z 2025-05-07T19:46:22.7329567Z 2025-05-07T19:46:22.7329571Z 2025-05-07T19:46:22.7329574Z 2025-05-07T19:46:22.7329577Z 2025-05-07T19:46:22.7329710Z  2025-05-07T19:46:22.7329914Z 2025-05-07T19:46:22.7329918Z 2025-05-07T19:46:22.7329921Z 2025-05-07T19:46:22.7329924Z 2025-05-07T19:46:22.7329928Z 2025-05-07T19:46:22.7329931Z 2025-05-07T19:46:22.7329934Z 2025-05-07T19:46:22.7330000Z 2025-05-07T19:46:22.7330007Z 2025-05-07T19:46:22.7330010Z 2025-05-07T19:46:22.7330014Z 2025-05-07T19:46:22.7330017Z 2025-05-07T19:46:22.7330020Z 2025-05-07T19:46:22.7330178Z  2025-05-07T19:46:22.7330375Z 2025-05-07T19:46:22.7330379Z 2025-05-07T19:46:22.7330382Z 2025-05-07T19:46:22.7330386Z 2025-05-07T19:46:22.7330389Z 2025-05-07T19:46:22.7330392Z 2025-05-07T19:46:22.7330396Z 2025-05-07T19:46:22.7330399Z 2025-05-07T19:46:22.7330402Z 2025-05-07T19:46:22.7330406Z 2025-05-07T19:46:22.7330409Z 2025-05-07T19:46:22.7330412Z 2025-05-07T19:46:22.7330415Z 2025-05-07T19:46:22.7330418Z 2025-05-07T19:46:22.7330577Z  2025-05-07T19:46:22.7330778Z 2025-05-07T19:46:22.7330782Z 2025-05-07T19:46:22.7330786Z 2025-05-07T19:46:22.7330789Z 2025-05-07T19:46:22.7330792Z 2025-05-07T19:46:22.7330796Z 2025-05-07T19:46:22.7330799Z 2025-05-07T19:46:22.7330802Z 2025-05-07T19:46:22.7330805Z 2025-05-07T19:46:22.7330812Z 2025-05-07T19:46:22.7330815Z 2025-05-07T19:46:22.7330822Z 2025-05-07T19:46:22.7330825Z 2025-05-07T19:46:22.7330847Z 2025-05-07T19:46:22.7330851Z 2025-05-07T19:46:22.7330997Z  2025-05-07T19:46:22.7331204Z 2025-05-07T19:46:22.7331208Z 2025-05-07T19:46:22.7331211Z 2025-05-07T19:46:22.7331215Z 2025-05-07T19:46:22.7331218Z 2025-05-07T19:46:22.7331222Z 2025-05-07T19:46:22.7331225Z 2025-05-07T19:46:22.7331228Z 2025-05-07T19:46:22.7331250Z 2025-05-07T19:46:22.7331253Z 2025-05-07T19:46:22.7331256Z 2025-05-07T19:46:22.7331259Z 2025-05-07T19:46:22.7331263Z 2025-05-07T19:46:22.7331266Z 2025-05-07T19:46:22.7331269Z 2025-05-07T19:46:22.7331273Z 2025-05-07T19:46:22.7331426Z  2025-05-07T19:46:22.7331636Z 2025-05-07T19:46:22.7331639Z 2025-05-07T19:46:22.7331643Z 2025-05-07T19:46:22.7331664Z 2025-05-07T19:46:22.7331667Z 2025-05-07T19:46:22.7331670Z 2025-05-07T19:46:22.7331674Z 2025-05-07T19:46:22.7331680Z 2025-05-07T19:46:22.7331684Z 2025-05-07T19:46:22.7331761Z 2025-05-07T19:46:22.7331765Z 2025-05-07T19:46:22.7331769Z 2025-05-07T19:46:22.7331772Z 2025-05-07T19:46:22.7331775Z 2025-05-07T19:46:22.7331778Z 2025-05-07T19:46:22.7331782Z 2025-05-07T19:46:22.7331785Z 2025-05-07T19:46:22.7331947Z  2025-05-07T19:46:22.7332186Z 2025-05-07T19:46:22.7332190Z 2025-05-07T19:46:22.7332193Z 2025-05-07T19:46:22.7332197Z 2025-05-07T19:46:22.7332200Z 2025-05-07T19:46:22.7332204Z 2025-05-07T19:46:22.7332207Z 2025-05-07T19:46:22.7332210Z 2025-05-07T19:46:22.7332214Z 2025-05-07T19:46:22.7332218Z 2025-05-07T19:46:22.7332221Z 2025-05-07T19:46:22.7332224Z 2025-05-07T19:46:22.7332228Z 2025-05-07T19:46:22.7332231Z 2025-05-07T19:46:22.7332234Z 2025-05-07T19:46:22.7332237Z 2025-05-07T19:46:22.7332240Z 2025-05-07T19:46:22.7332244Z 2025-05-07T19:46:22.7332427Z  2025-05-07T19:46:22.7332651Z 2025-05-07T19:46:22.7332659Z 2025-05-07T19:46:22.7332770Z  2025-05-07T19:46:22.7332896Z 2025-05-07T19:46:22.7332899Z 2025-05-07T19:46:22.7333000Z  2025-05-07T19:46:22.7333197Z 2025-05-07T19:46:22.7333201Z 2025-05-07T19:46:22.7333205Z 2025-05-07T19:46:22.7333326Z  2025-05-07T19:46:22.7333439Z 2025-05-07T19:46:22.7333442Z 2025-05-07T19:46:22.7333445Z 2025-05-07T19:46:22.7333449Z 2025-05-07T19:46:22.7333552Z  2025-05-07T19:46:22.7333687Z 2025-05-07T19:46:22.7333690Z 2025-05-07T19:46:22.7333694Z 2025-05-07T19:46:22.7333697Z 2025-05-07T19:46:22.7333701Z 2025-05-07T19:46:22.7333805Z  2025-05-07T19:46:22.7333934Z 2025-05-07T19:46:22.7333937Z 2025-05-07T19:46:22.7333942Z 2025-05-07T19:46:22.7333945Z 2025-05-07T19:46:22.7333965Z 2025-05-07T19:46:22.7333968Z 2025-05-07T19:46:22.7334075Z  2025-05-07T19:46:22.7334210Z 2025-05-07T19:46:22.7334214Z 2025-05-07T19:46:22.7334217Z 2025-05-07T19:46:22.7334220Z 2025-05-07T19:46:22.7334224Z 2025-05-07T19:46:22.7334289Z 2025-05-07T19:46:22.7334292Z 2025-05-07T19:46:22.7334428Z  2025-05-07T19:46:22.7334569Z 2025-05-07T19:46:22.7334573Z 2025-05-07T19:46:22.7334576Z 2025-05-07T19:46:22.7334579Z 2025-05-07T19:46:22.7334582Z 2025-05-07T19:46:22.7334586Z 2025-05-07T19:46:22.7334589Z 2025-05-07T19:46:22.7334593Z 2025-05-07T19:46:22.7334725Z  2025-05-07T19:46:22.7334877Z 2025-05-07T19:46:22.7334882Z 2025-05-07T19:46:22.7334885Z 2025-05-07T19:46:22.7334888Z 2025-05-07T19:46:22.7334892Z 2025-05-07T19:46:22.7334895Z 2025-05-07T19:46:22.7334899Z 2025-05-07T19:46:22.7334902Z 2025-05-07T19:46:22.7334905Z 2025-05-07T19:46:22.7335026Z  2025-05-07T19:46:22.7335206Z 2025-05-07T19:46:22.7335210Z 2025-05-07T19:46:22.7335213Z 2025-05-07T19:46:22.7335217Z 2025-05-07T19:46:22.7335220Z 2025-05-07T19:46:22.7335224Z 2025-05-07T19:46:22.7335227Z 2025-05-07T19:46:22.7335230Z 2025-05-07T19:46:22.7335234Z 2025-05-07T19:46:22.7335237Z 2025-05-07T19:46:22.7335371Z  2025-05-07T19:46:22.7335555Z 2025-05-07T19:46:22.7335559Z 2025-05-07T19:46:22.7335562Z 2025-05-07T19:46:22.7335566Z 2025-05-07T19:46:22.7335570Z 2025-05-07T19:46:22.7335573Z 2025-05-07T19:46:22.7335577Z 2025-05-07T19:46:22.7335580Z 2025-05-07T19:46:22.7335583Z 2025-05-07T19:46:22.7335587Z 2025-05-07T19:46:22.7335590Z 2025-05-07T19:46:22.7335720Z  2025-05-07T19:46:22.7335915Z 2025-05-07T19:46:22.7335919Z 2025-05-07T19:46:22.7335922Z 2025-05-07T19:46:22.7335926Z 2025-05-07T19:46:22.7335929Z 2025-05-07T19:46:22.7335933Z 2025-05-07T19:46:22.7335936Z 2025-05-07T19:46:22.7335939Z 2025-05-07T19:46:22.7335942Z 2025-05-07T19:46:22.7335946Z 2025-05-07T19:46:22.7335949Z 2025-05-07T19:46:22.7335952Z 2025-05-07T19:46:22.7336087Z  2025-05-07T19:46:22.7336466Z 2025-05-07T19:46:22.7336469Z 2025-05-07T19:46:22.7336473Z 2025-05-07T19:46:22.7336476Z 2025-05-07T19:46:22.7336484Z 2025-05-07T19:46:22.7336487Z 2025-05-07T19:46:22.7336550Z 2025-05-07T19:46:22.7336554Z 2025-05-07T19:46:22.7336558Z 2025-05-07T19:46:22.7336561Z 2025-05-07T19:46:22.7336565Z 2025-05-07T19:46:22.7336568Z 2025-05-07T19:46:22.7336571Z 2025-05-07T19:46:22.7336735Z  2025-05-07T19:46:22.7337068Z 2025-05-07T19:46:22.7337072Z 2025-05-07T19:46:22.7337075Z 2025-05-07T19:46:22.7337079Z 2025-05-07T19:46:22.7337082Z 2025-05-07T19:46:22.7337085Z 2025-05-07T19:46:22.7337089Z 2025-05-07T19:46:22.7337092Z 2025-05-07T19:46:22.7337096Z 2025-05-07T19:46:22.7337099Z 2025-05-07T19:46:22.7337103Z 2025-05-07T19:46:22.7337107Z 2025-05-07T19:46:22.7337110Z 2025-05-07T19:46:22.7337114Z 2025-05-07T19:46:22.7337288Z  2025-05-07T19:46:22.7337498Z 2025-05-07T19:46:22.7337502Z 2025-05-07T19:46:22.7337505Z 2025-05-07T19:46:22.7337509Z 2025-05-07T19:46:22.7337512Z 2025-05-07T19:46:22.7337516Z 2025-05-07T19:46:22.7337519Z 2025-05-07T19:46:22.7337526Z 2025-05-07T19:46:22.7337531Z 2025-05-07T19:46:22.7337537Z 2025-05-07T19:46:22.7337541Z 2025-05-07T19:46:22.7337563Z 2025-05-07T19:46:22.7337566Z 2025-05-07T19:46:22.7337570Z 2025-05-07T19:46:22.7337573Z 2025-05-07T19:46:22.7337730Z  2025-05-07T19:46:22.7338014Z 2025-05-07T19:46:22.7338018Z 2025-05-07T19:46:22.7338022Z 2025-05-07T19:46:22.7338026Z 2025-05-07T19:46:22.7338029Z 2025-05-07T19:46:22.7338033Z 2025-05-07T19:46:22.7338037Z 2025-05-07T19:46:22.7338058Z 2025-05-07T19:46:22.7338062Z 2025-05-07T19:46:22.7338065Z 2025-05-07T19:46:22.7338069Z 2025-05-07T19:46:22.7338072Z 2025-05-07T19:46:22.7338076Z 2025-05-07T19:46:22.7338079Z 2025-05-07T19:46:22.7338083Z 2025-05-07T19:46:22.7338086Z 2025-05-07T19:46:22.7338243Z  2025-05-07T19:46:22.7338466Z 2025-05-07T19:46:22.7338487Z 2025-05-07T19:46:22.7338491Z 2025-05-07T19:46:22.7338494Z 2025-05-07T19:46:22.7338497Z 2025-05-07T19:46:22.7338572Z 2025-05-07T19:46:22.7338576Z 2025-05-07T19:46:22.7338582Z 2025-05-07T19:46:22.7338586Z 2025-05-07T19:46:22.7338589Z 2025-05-07T19:46:22.7338593Z 2025-05-07T19:46:22.7338596Z 2025-05-07T19:46:22.7338599Z 2025-05-07T19:46:22.7338603Z 2025-05-07T19:46:22.7338606Z 2025-05-07T19:46:22.7338610Z 2025-05-07T19:46:22.7338613Z 2025-05-07T19:46:22.7338777Z  2025-05-07T19:46:22.7339017Z 2025-05-07T19:46:22.7339022Z 2025-05-07T19:46:22.7339025Z 2025-05-07T19:46:22.7339029Z 2025-05-07T19:46:22.7339032Z 2025-05-07T19:46:22.7339036Z 2025-05-07T19:46:22.7339039Z 2025-05-07T19:46:22.7339043Z 2025-05-07T19:46:22.7339046Z 2025-05-07T19:46:22.7339049Z 2025-05-07T19:46:22.7339053Z 2025-05-07T19:46:22.7339056Z 2025-05-07T19:46:22.7339060Z 2025-05-07T19:46:22.7339063Z 2025-05-07T19:46:22.7339066Z 2025-05-07T19:46:22.7339070Z 2025-05-07T19:46:22.7339073Z 2025-05-07T19:46:22.7339076Z 2025-05-07T19:46:22.7339263Z  2025-05-07T19:46:22.7339496Z 2025-05-07T19:46:22.7339500Z 2025-05-07T19:46:22.7339600Z  2025-05-07T19:46:22.7339729Z 2025-05-07T19:46:22.7339732Z 2025-05-07T19:46:22.7339830Z  2025-05-07T19:46:22.7339944Z 2025-05-07T19:46:22.7339948Z 2025-05-07T19:46:22.7339951Z 2025-05-07T19:46:22.7340074Z  2025-05-07T19:46:22.7340186Z 2025-05-07T19:46:22.7340190Z 2025-05-07T19:46:22.7340193Z 2025-05-07T19:46:22.7340197Z 2025-05-07T19:46:22.7340299Z  2025-05-07T19:46:22.7340437Z 2025-05-07T19:46:22.7340440Z 2025-05-07T19:46:22.7340444Z 2025-05-07T19:46:22.7340448Z 2025-05-07T19:46:22.7340451Z 2025-05-07T19:46:22.7340558Z  2025-05-07T19:46:22.7340686Z 2025-05-07T19:46:22.7340690Z 2025-05-07T19:46:22.7340693Z 2025-05-07T19:46:22.7340697Z 2025-05-07T19:46:22.7340718Z 2025-05-07T19:46:22.7340721Z 2025-05-07T19:46:22.7340832Z  2025-05-07T19:46:22.7340965Z 2025-05-07T19:46:22.7340969Z 2025-05-07T19:46:22.7340973Z 2025-05-07T19:46:22.7340979Z 2025-05-07T19:46:22.7341045Z 2025-05-07T19:46:22.7341049Z 2025-05-07T19:46:22.7341052Z 2025-05-07T19:46:22.7341186Z  2025-05-07T19:46:22.7341331Z 2025-05-07T19:46:22.7341334Z 2025-05-07T19:46:22.7341340Z 2025-05-07T19:46:22.7341343Z 2025-05-07T19:46:22.7341346Z 2025-05-07T19:46:22.7341350Z 2025-05-07T19:46:22.7341354Z 2025-05-07T19:46:22.7341357Z 2025-05-07T19:46:22.7341492Z  2025-05-07T19:46:22.7341646Z 2025-05-07T19:46:22.7341650Z 2025-05-07T19:46:22.7341653Z 2025-05-07T19:46:22.7341656Z 2025-05-07T19:46:22.7341660Z 2025-05-07T19:46:22.7341663Z 2025-05-07T19:46:22.7341666Z 2025-05-07T19:46:22.7341670Z 2025-05-07T19:46:22.7341673Z 2025-05-07T19:46:22.7341800Z  2025-05-07T19:46:22.7341981Z 2025-05-07T19:46:22.7341984Z 2025-05-07T19:46:22.7341988Z 2025-05-07T19:46:22.7341991Z 2025-05-07T19:46:22.7341994Z 2025-05-07T19:46:22.7341998Z 2025-05-07T19:46:22.7342001Z 2025-05-07T19:46:22.7342008Z 2025-05-07T19:46:22.7342012Z 2025-05-07T19:46:22.7342018Z 2025-05-07T19:46:22.7342149Z  2025-05-07T19:46:22.7342340Z 2025-05-07T19:46:22.7342343Z 2025-05-07T19:46:22.7342347Z 2025-05-07T19:46:22.7342350Z 2025-05-07T19:46:22.7342354Z 2025-05-07T19:46:22.7342357Z 2025-05-07T19:46:22.7342360Z 2025-05-07T19:46:22.7342364Z 2025-05-07T19:46:22.7342368Z 2025-05-07T19:46:22.7342371Z 2025-05-07T19:46:22.7342375Z 2025-05-07T19:46:22.7342507Z  2025-05-07T19:46:22.7342707Z 2025-05-07T19:46:22.7342710Z 2025-05-07T19:46:22.7342714Z 2025-05-07T19:46:22.7342717Z 2025-05-07T19:46:22.7342721Z 2025-05-07T19:46:22.7342724Z 2025-05-07T19:46:22.7342727Z 2025-05-07T19:46:22.7342731Z 2025-05-07T19:46:22.7342734Z 2025-05-07T19:46:22.7342738Z 2025-05-07T19:46:22.7342742Z 2025-05-07T19:46:22.7342745Z 2025-05-07T19:46:22.7342885Z  2025-05-07T19:46:22.7343090Z 2025-05-07T19:46:22.7343094Z 2025-05-07T19:46:22.7343181Z 2025-05-07T19:46:22.7343185Z 2025-05-07T19:46:22.7343191Z 2025-05-07T19:46:22.7343194Z 2025-05-07T19:46:22.7343198Z 2025-05-07T19:46:22.7343201Z 2025-05-07T19:46:22.7343205Z 2025-05-07T19:46:22.7343208Z 2025-05-07T19:46:22.7343212Z 2025-05-07T19:46:22.7343215Z 2025-05-07T19:46:22.7343219Z 2025-05-07T19:46:22.7343376Z  2025-05-07T19:46:22.7343576Z 2025-05-07T19:46:22.7343579Z 2025-05-07T19:46:22.7343584Z 2025-05-07T19:46:22.7343587Z 2025-05-07T19:46:22.7343590Z 2025-05-07T19:46:22.7343594Z 2025-05-07T19:46:22.7343598Z 2025-05-07T19:46:22.7343602Z 2025-05-07T19:46:22.7343605Z 2025-05-07T19:46:22.7343608Z 2025-05-07T19:46:22.7343612Z 2025-05-07T19:46:22.7343615Z 2025-05-07T19:46:22.7343619Z 2025-05-07T19:46:22.7343623Z 2025-05-07T19:46:22.7343789Z  2025-05-07T19:46:22.7343998Z 2025-05-07T19:46:22.7344002Z 2025-05-07T19:46:22.7344006Z 2025-05-07T19:46:22.7344009Z 2025-05-07T19:46:22.7344012Z 2025-05-07T19:46:22.7344019Z 2025-05-07T19:46:22.7344026Z 2025-05-07T19:46:22.7344029Z 2025-05-07T19:46:22.7344033Z 2025-05-07T19:46:22.7344036Z 2025-05-07T19:46:22.7344039Z 2025-05-07T19:46:22.7344061Z 2025-05-07T19:46:22.7344065Z 2025-05-07T19:46:22.7344069Z 2025-05-07T19:46:22.7344072Z 2025-05-07T19:46:22.7344221Z  2025-05-07T19:46:22.7344433Z 2025-05-07T19:46:22.7344437Z 2025-05-07T19:46:22.7344440Z 2025-05-07T19:46:22.7344444Z 2025-05-07T19:46:22.7344447Z 2025-05-07T19:46:22.7344450Z 2025-05-07T19:46:22.7344454Z 2025-05-07T19:46:22.7344475Z 2025-05-07T19:46:22.7344479Z 2025-05-07T19:46:22.7344483Z 2025-05-07T19:46:22.7344486Z 2025-05-07T19:46:22.7344490Z 2025-05-07T19:46:22.7344493Z 2025-05-07T19:46:22.7344496Z 2025-05-07T19:46:22.7344500Z 2025-05-07T19:46:22.7344503Z 2025-05-07T19:46:22.7344777Z  2025-05-07T19:46:22.7344992Z 2025-05-07T19:46:22.7345013Z 2025-05-07T19:46:22.7345017Z 2025-05-07T19:46:22.7345023Z 2025-05-07T19:46:22.7345089Z 2025-05-07T19:46:22.7345093Z 2025-05-07T19:46:22.7345096Z 2025-05-07T19:46:22.7345100Z 2025-05-07T19:46:22.7345103Z 2025-05-07T19:46:22.7345106Z 2025-05-07T19:46:22.7345110Z 2025-05-07T19:46:22.7345113Z 2025-05-07T19:46:22.7345116Z 2025-05-07T19:46:22.7345119Z 2025-05-07T19:46:22.7345123Z 2025-05-07T19:46:22.7345126Z 2025-05-07T19:46:22.7345130Z 2025-05-07T19:46:22.7345291Z  2025-05-07T19:46:22.7345526Z 2025-05-07T19:46:22.7345530Z 2025-05-07T19:46:22.7345533Z 2025-05-07T19:46:22.7345536Z 2025-05-07T19:46:22.7345539Z 2025-05-07T19:46:22.7345543Z 2025-05-07T19:46:22.7345546Z 2025-05-07T19:46:22.7345549Z 2025-05-07T19:46:22.7345553Z 2025-05-07T19:46:22.7345556Z 2025-05-07T19:46:22.7345560Z 2025-05-07T19:46:22.7345563Z 2025-05-07T19:46:22.7345566Z 2025-05-07T19:46:22.7345570Z 2025-05-07T19:46:22.7345573Z 2025-05-07T19:46:22.7345576Z 2025-05-07T19:46:22.7345580Z 2025-05-07T19:46:22.7345586Z 2025-05-07T19:46:22.7345770Z  2025-05-07T19:46:22.7345996Z 2025-05-07T19:46:22.7345999Z 2025-05-07T19:46:22.7346096Z  2025-05-07T19:46:22.7346219Z 2025-05-07T19:46:22.7346222Z 2025-05-07T19:46:22.7346322Z  2025-05-07T19:46:22.7346428Z 2025-05-07T19:46:22.7346432Z 2025-05-07T19:46:22.7346435Z 2025-05-07T19:46:22.7346553Z  2025-05-07T19:46:22.7346667Z 2025-05-07T19:46:22.7346671Z 2025-05-07T19:46:22.7346674Z 2025-05-07T19:46:22.7346678Z 2025-05-07T19:46:22.7346781Z  2025-05-07T19:46:22.7346921Z 2025-05-07T19:46:22.7346925Z 2025-05-07T19:46:22.7346928Z 2025-05-07T19:46:22.7346931Z 2025-05-07T19:46:22.7346935Z 2025-05-07T19:46:22.7347039Z  2025-05-07T19:46:22.7347164Z 2025-05-07T19:46:22.7347168Z 2025-05-07T19:46:22.7347171Z 2025-05-07T19:46:22.7347175Z 2025-05-07T19:46:22.7347195Z 2025-05-07T19:46:22.7347199Z 2025-05-07T19:46:22.7347306Z  2025-05-07T19:46:22.7349144Z 2025-05-07T19:46:22.7349148Z 2025-05-07T19:46:22.7349156Z 2025-05-07T19:46:22.7349159Z 2025-05-07T19:46:22.7349163Z 2025-05-07T19:46:22.7349166Z 2025-05-07T19:46:22.7349170Z 2025-05-07T19:46:22.7349321Z  2025-05-07T19:46:22.7349471Z 2025-05-07T19:46:22.7349474Z 2025-05-07T19:46:22.7349478Z 2025-05-07T19:46:22.7349481Z 2025-05-07T19:46:22.7349485Z 2025-05-07T19:46:22.7349488Z 2025-05-07T19:46:22.7349492Z 2025-05-07T19:46:22.7349495Z 2025-05-07T19:46:22.7349639Z  2025-05-07T19:46:22.7349797Z 2025-05-07T19:46:22.7349802Z 2025-05-07T19:46:22.7349806Z 2025-05-07T19:46:22.7349811Z 2025-05-07T19:46:22.7349814Z 2025-05-07T19:46:22.7349818Z 2025-05-07T19:46:22.7349821Z 2025-05-07T19:46:22.7349825Z 2025-05-07T19:46:22.7349828Z 2025-05-07T19:46:22.7350005Z  done 2025-05-07T19:46:22.9452417Z Preparing transaction: | / done 2025-05-07T19:46:23.6471904Z Verifying transaction: \ | / - \ | / done 2025-05-07T19:46:23.9515792Z Executing transaction: \ | / done 2025-05-07T19:46:25.8950884Z [INSTALL] Fixing file placements for CUDA 12.8.0+ ... 2025-05-07T19:46:25.8952084Z [INSTALL] Creating symlinks: libnvToolsExt.so 2025-05-07T19:46:25.8954302Z + ln -sf /github/home/miniconda/envs/build_binary/lib/libnvToolsExt.so.1 /github/home/miniconda/envs/build_binary/lib/libnvToolsExt.so 2025-05-07T19:46:25.8956127Z 2025-05-07T19:46:25.8968518Z 2025-05-07T19:46:25.8969518Z + ln -sf /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvToolsExt.so.1 /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvToolsExt.so 2025-05-07T19:46:25.8970334Z 2025-05-07T19:46:25.8983791Z 2025-05-07T19:46:25.8984382Z [INSTALL] Copying nvtx3 headers ... 2025-05-07T19:46:25.8993178Z + cp -r /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExt.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtCuda.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtCudaRt.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtOpenCL.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtSync.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvtx3.hpp /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvtxDetail /github/home/miniconda/envs/build_binary/include/ 2025-05-07T19:46:25.8997157Z 2025-05-07T19:46:25.9119475Z 2025-05-07T19:46:25.9124104Z + cp -r /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExt.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtCuda.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtCudaRt.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtOpenCL.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtSync.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvtx3.hpp /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvtxDetail /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include/ 2025-05-07T19:46:25.9128150Z 2025-05-07T19:46:25.9140413Z 2025-05-07T19:46:25.9141572Z [INSTALL] Appending libcuda.so path to LD_LIBRARY_PATH ... 2025-05-07T19:46:25.9533752Z [ENV] Appending to LD_LIBRARY_PATH: /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs ... 2025-05-07T19:46:27.8328503Z + conda env config vars set -n build_binary LD_LIBRARY_PATH=/github/home/miniconda/envs/build_binary/lib:/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs 2025-05-07T19:46:27.8329919Z 2025-05-07T19:46:28.2463514Z 2025-05-07T19:46:28.2466826Z [INSTALL] Setting environment variable NVML_LIB_PATH ... 2025-05-07T19:46:28.2846301Z + conda env config vars set -n build_binary NVML_LIB_PATH=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:46:28.2847933Z 2025-05-07T19:46:28.7100642Z 2025-05-07T19:46:28.7101447Z [INSTALL] Setting environment variable CUDA_INCLUDE_DIRS ... 2025-05-07T19:46:28.7102614Z + conda env config vars set -n build_binary CUDA_INCLUDE_DIRS="/github/home/miniconda/envs/build_binary/include/:/github/home/miniconda/envs/build_binary/targets/x86_64-linux/include/" 2025-05-07T19:46:28.7103438Z 2025-05-07T19:46:29.1255651Z 2025-05-07T19:46:31.1167846Z [CHECK] cuda_runtime.h found in CONDA_PREFIX PATH (file): /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include/cuda_runtime.h 2025-05-07T19:46:33.0520624Z [CHECK] libcuda.so found in CONDA_PREFIX PATH (file): /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:46:35.0162100Z [CHECK] libnvToolsExt.so found in CONDA_PREFIX PATH (symbolic link): /github/home/miniconda/envs/build_binary/lib/libnvToolsExt.so 2025-05-07T19:46:35.0163041Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvToolsExt.so 2025-05-07T19:46:36.9979472Z [CHECK] libnvidia-ml.so found in CONDA_PREFIX PATH (file): /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libnvidia-ml.so 2025-05-07T19:46:38.8044521Z /github/home/miniconda/envs/build_binary/bin/nvcc 2025-05-07T19:46:38.8045385Z 2025-05-07T19:46:38.8791828Z [CHECK] Binary nvcc found in PATH 2025-05-07T19:46:42.5884958Z clang version 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4) 2025-05-07T19:46:42.5886908Z Target: x86_64-conda-linux-gnu 2025-05-07T19:46:42.5887821Z Thread model: posix 2025-05-07T19:46:42.5889449Z InstalledDir: /github/home/miniconda/envs/build_binary/bin 2025-05-07T19:46:42.5890192Z Configuration file: /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-clang.cfg 2025-05-07T19:46:42.5890641Z 2025-05-07T19:46:42.6635664Z [INSTALL] Resetting compiler symlinks to clang ... 2025-05-07T19:46:46.4423889Z + ln -sf /github/home/miniconda/envs/build_binary/bin/clang /github/home/miniconda/envs/build_binary/bin/cc 2025-05-07T19:46:46.4424496Z 2025-05-07T19:46:46.4443499Z 2025-05-07T19:46:46.4465196Z + ln -sf /github/home/miniconda/envs/build_binary/bin/clang /github/home/miniconda/envs/build_binary/bin/gcc 2025-05-07T19:46:46.4466748Z 2025-05-07T19:46:46.4482335Z 2025-05-07T19:46:46.4501385Z + ln -sf /github/home/miniconda/envs/build_binary/bin/clang++ /github/home/miniconda/envs/build_binary/bin/c++ 2025-05-07T19:46:46.4502918Z 2025-05-07T19:46:46.4519295Z 2025-05-07T19:46:46.4539734Z + ln -sf /github/home/miniconda/envs/build_binary/bin/clang++ /github/home/miniconda/envs/build_binary/bin/g++ 2025-05-07T19:46:46.4540279Z 2025-05-07T19:46:46.4558204Z 2025-05-07T19:46:46.4558628Z + ls -la /github/home/miniconda/envs/build_binary/etc/conda/activate.d 2025-05-07T19:46:46.4559015Z 2025-05-07T19:46:46.4576235Z total 56 2025-05-07T19:46:46.4576670Z drwxr-xr-x. 2 root root 16384 May 7 19:46 . 2025-05-07T19:46:46.4577545Z drwxr-xr-x. 5 root root 62 May 7 19:44 .. 2025-05-07T19:46:46.4578397Z -rw-r--r--. 2 root root 3778 Jun 10 2024 activate-binutils_linux-64.sh 2025-05-07T19:46:46.4578995Z -rw-r--r--. 2 root root 11630 Jun 10 2024 activate-gcc_linux-64.sh 2025-05-07T19:46:46.4579635Z -rw-r--r--. 2 root root 5190 Jun 10 2024 activate-gxx_linux-64.sh 2025-05-07T19:46:46.4580138Z -rw-r--r--. 2 root root 136 Mar 27 01:27 libglib_activate.sh 2025-05-07T19:46:46.4580999Z -rw-r--r--. 2 root root 873 Jun 5 2024 libxml2_activate.sh 2025-05-07T19:46:46.4581700Z -rw-r--r--. 2 root root 499 Nov 30 04:26 openjdk_activate.sh 2025-05-07T19:46:46.4582455Z -rw-r--r--. 2 root root 2932 Jan 24 22:22 ~cuda-nvcc_activate.sh 2025-05-07T19:46:46.4582774Z 2025-05-07T19:46:46.4583016Z [INSTALL] Removing the -ccbin=CXX hook from NVCC activation scripts ... 2025-05-07T19:46:46.4583712Z + sed -i /-ccbin=/d /github/home/miniconda/envs/build_binary/etc/conda/activate.d/*cuda-nvcc_activate.sh 2025-05-07T19:46:46.4584205Z 2025-05-07T19:46:46.4595661Z 2025-05-07T19:46:46.4595854Z + conda run -n build_binary c++ --version | grep -i clang 2025-05-07T19:46:46.4596258Z 2025-05-07T19:46:48.4047131Z clang version 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4) 2025-05-07T19:46:48.4049840Z Configuration file: /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-clang++.cfg 2025-05-07T19:46:48.4051241Z 2025-05-07T19:46:48.4051642Z [BUILD] Setting Clang as the NVCC host compiler: 2025-05-07T19:46:50.3251682Z [BUILD] Setting prepend flags for NVCC ... 2025-05-07T19:46:50.3252741Z + conda env config vars set -n build_binary NVCC_PREPEND_FLAGS="-allow-unsupported-compiler -Xcompiler -stdlib=libstdc++ -ccbin /github/home/miniconda/envs/build_binary/bin/c++" 2025-05-07T19:46:50.3253580Z 2025-05-07T19:46:50.7613536Z 2025-05-07T19:46:50.7613936Z + conda run -n build_binary printenv NVCC_PREPEND_FLAGS 2025-05-07T19:46:50.7614264Z 2025-05-07T19:46:52.6013000Z -allow-unsupported-compiler -Xcompiler -stdlib=libstdc++ -ccbin /github/home/miniconda/envs/build_binary/bin/c++ 2025-05-07T19:46:52.6013627Z 2025-05-07T19:46:52.6746308Z 2025-05-07T19:46:52.6746868Z [INFO] Printing out all preprocessor defines in nvcc ... 2025-05-07T19:46:52.6747485Z + conda run -n build_binary nvcc --compiler-options -dM -E -x cu - < /dev/null 2025-05-07T19:46:52.6747847Z 2025-05-07T19:46:54.5573480Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:46:54.5574388Z 2025-05-07T19:46:54.5574561Z #define ADJ_ESTERROR 0x0008 2025-05-07T19:46:54.5575159Z #define ADJ_FREQUENCY 0x0002 2025-05-07T19:46:54.5575498Z #define ADJ_MAXERROR 0x0004 2025-05-07T19:46:54.5575821Z #define ADJ_MICRO 0x1000 2025-05-07T19:46:54.5576093Z #define ADJ_NANO 0x2000 2025-05-07T19:46:54.5576396Z #define ADJ_OFFSET 0x0001 2025-05-07T19:46:54.5576690Z #define ADJ_OFFSET_SINGLESHOT 0x8001 2025-05-07T19:46:54.5577046Z #define ADJ_OFFSET_SS_READ 0xa001 2025-05-07T19:46:54.5577452Z #define ADJ_STATUS 0x0010 2025-05-07T19:46:54.5577772Z #define ADJ_TAI 0x0080 2025-05-07T19:46:54.5578044Z #define ADJ_TICK 0x4000 2025-05-07T19:46:54.5578349Z #define ADJ_TIMECONST 0x0020 2025-05-07T19:46:54.5578646Z #define AIO_PRIO_DELTA_MAX 20 2025-05-07T19:46:54.5578986Z #define BC_BASE_MAX _POSIX2_BC_BASE_MAX 2025-05-07T19:46:54.5579347Z #define BC_DIM_MAX _POSIX2_BC_DIM_MAX 2025-05-07T19:46:54.5579679Z #define BC_SCALE_MAX _POSIX2_BC_SCALE_MAX 2025-05-07T19:46:54.5580055Z #define BC_STRING_MAX _POSIX2_BC_STRING_MAX 2025-05-07T19:46:54.5580403Z #define BIG_ENDIAN __BIG_ENDIAN 2025-05-07T19:46:54.5580731Z #define BUFSIZ _IO_BUFSIZ 2025-05-07T19:46:54.5581008Z #define BYTE_ORDER __BYTE_ORDER 2025-05-07T19:46:54.5581332Z #define CHARCLASS_NAME_MAX 2048 2025-05-07T19:46:54.5581623Z #define CHAR_BIT __CHAR_BIT__ 2025-05-07T19:46:54.5581943Z #define CHAR_MAX __SCHAR_MAX__ 2025-05-07T19:46:54.5582233Z #define CHAR_MIN SCHAR_MIN 2025-05-07T19:46:54.5582544Z #define CLOCKS_PER_SEC 1000000l 2025-05-07T19:46:54.5582951Z #define CLOCK_BOOTTIME 7 2025-05-07T19:46:54.5583202Z #define CLOCK_BOOTTIME_ALARM 9 2025-05-07T19:46:54.5583490Z #define CLOCK_MONOTONIC 1 2025-05-07T19:46:54.5583775Z #define CLOCK_MONOTONIC_COARSE 6 2025-05-07T19:46:54.5584098Z #define CLOCK_MONOTONIC_RAW 4 2025-05-07T19:46:54.5584401Z #define CLOCK_PROCESS_CPUTIME_ID 2 2025-05-07T19:46:54.5584739Z #define CLOCK_REALTIME 0 2025-05-07T19:46:54.5585014Z #define CLOCK_REALTIME_ALARM 8 2025-05-07T19:46:54.5585342Z #define CLOCK_REALTIME_COARSE 5 2025-05-07T19:46:54.5585630Z #define CLOCK_TAI 11 2025-05-07T19:46:54.5588119Z #define CLOCK_THREAD_CPUTIME_ID 3 2025-05-07T19:46:54.5588461Z #define COLL_WEIGHTS_MAX 255 2025-05-07T19:46:54.5588746Z #define CUDARTAPI 2025-05-07T19:46:54.5589031Z #define CUDARTAPI_CDECL 2025-05-07T19:46:54.5589339Z #define CUDART_CB 2025-05-07T19:46:54.5589596Z #define CUDART_DEVICE __device__ 2025-05-07T19:46:54.5589930Z #define CUDART_VERSION 12080 2025-05-07T19:46:54.5590234Z #define CUDA_DOUBLE_MATH_FUNCTIONS 1 2025-05-07T19:46:54.5590581Z #define CUDA_IPC_HANDLE_SIZE 64 2025-05-07T19:46:54.5590880Z #define CU_UUID_HAS_BEEN_DEFINED 2025-05-07T19:46:54.5591224Z #define DELAYTIMER_MAX 2147483647 2025-05-07T19:46:54.5591528Z #define DOMAIN 1 2025-05-07T19:46:54.5591810Z #define EOF (-1) 2025-05-07T19:46:54.5592055Z #define EXIT_FAILURE 1 2025-05-07T19:46:54.5592355Z #define EXIT_SUCCESS 0 2025-05-07T19:46:54.5592681Z #define EXPR_NEST_MAX _POSIX2_EXPR_NEST_MAX 2025-05-07T19:46:54.5593097Z #define FD_CLR(fd,fdsetp) __FD_CLR (fd, fdsetp) 2025-05-07T19:46:54.5593544Z #define FD_ISSET(fd,fdsetp) __FD_ISSET (fd, fdsetp) 2025-05-07T19:46:54.5593948Z #define FD_SET(fd,fdsetp) __FD_SET (fd, fdsetp) 2025-05-07T19:46:54.5594348Z #define FD_SETSIZE __FD_SETSIZE 2025-05-07T19:46:54.5594672Z #define FD_ZERO(fdsetp) __FD_ZERO (fdsetp) 2025-05-07T19:46:54.5595052Z #define FILENAME_MAX 4096 2025-05-07T19:46:54.5595332Z #define FOPEN_MAX 16 2025-05-07T19:46:54.5595666Z #define FP_ILOGB0 (-2147483647 - 1) 2025-05-07T19:46:54.5595987Z #define FP_ILOGBNAN (-2147483647 - 1) 2025-05-07T19:46:54.5596335Z #define FP_INFINITE 1 2025-05-07T19:46:54.5596627Z #define FP_NAN 0 2025-05-07T19:46:54.5596868Z #define FP_NORMAL 4 2025-05-07T19:46:54.5597145Z #define FP_SUBNORMAL 3 2025-05-07T19:46:54.5597399Z #define FP_ZERO 2 2025-05-07T19:46:54.5597672Z #define HOST_NAME_MAX 64 2025-05-07T19:46:54.5597962Z #define HUGE 3.40282347e+38F 2025-05-07T19:46:54.5598286Z #define HUGE_VAL (__builtin_huge_val()) 2025-05-07T19:46:54.5598639Z #define HUGE_VALF (__builtin_huge_valf()) 2025-05-07T19:46:54.5599108Z #define HUGE_VALL (__builtin_huge_vall()) 2025-05-07T19:46:54.5599459Z #define INFINITY (__builtin_inff()) 2025-05-07T19:46:54.5599814Z #define INT_MAX __INT_MAX__ 2025-05-07T19:46:54.5600155Z #define INT_MIN (-__INT_MAX__ -1) 2025-05-07T19:46:54.5600466Z #define IOV_MAX 1024 2025-05-07T19:46:54.5600772Z #define LINE_MAX _POSIX2_LINE_MAX 2025-05-07T19:46:54.5601093Z #define LITTLE_ENDIAN __LITTLE_ENDIAN 2025-05-07T19:46:54.5601453Z #define LLONG_MAX __LONG_LONG_MAX__ 2025-05-07T19:46:54.5601790Z #define LLONG_MIN (-__LONG_LONG_MAX__-1LL) 2025-05-07T19:46:54.5602165Z #define LOGIN_NAME_MAX 256 2025-05-07T19:46:54.5602448Z #define LONG_BIT 64 2025-05-07T19:46:54.5602748Z #define LONG_LONG_MAX __LONG_LONG_MAX__ 2025-05-07T19:46:54.5603120Z #define LONG_LONG_MIN (-__LONG_LONG_MAX__-1LL) 2025-05-07T19:46:54.5603462Z #define LONG_MAX __LONG_MAX__ 2025-05-07T19:46:54.5603785Z #define LONG_MIN (-__LONG_MAX__ -1L) 2025-05-07T19:46:54.5604089Z #define L_ctermid 9 2025-05-07T19:46:54.5604358Z #define L_cuserid 9 2025-05-07T19:46:54.5604610Z #define L_tmpnam 20 2025-05-07T19:46:54.5604886Z #define MATH_ERREXCEPT 2 2025-05-07T19:46:54.5605149Z #define MATH_ERRNO 1 2025-05-07T19:46:54.5605422Z #define MAX_CANON 255 2025-05-07T19:46:54.5605675Z #define MAX_INPUT 255 2025-05-07T19:46:54.5605994Z #define MB_CUR_MAX (__ctype_get_mb_cur_max ()) 2025-05-07T19:46:54.5606329Z #define MB_LEN_MAX 16 2025-05-07T19:46:54.5606627Z #define MOD_CLKA ADJ_OFFSET_SINGLESHOT 2025-05-07T19:46:54.5606974Z #define MOD_CLKB ADJ_TICK 2025-05-07T19:46:54.5607260Z #define MOD_ESTERROR ADJ_ESTERROR 2025-05-07T19:46:54.5607597Z #define MOD_FREQUENCY ADJ_FREQUENCY 2025-05-07T19:46:54.5607913Z #define MOD_MAXERROR ADJ_MAXERROR 2025-05-07T19:46:54.5608242Z #define MOD_MICRO ADJ_MICRO 2025-05-07T19:46:54.5608527Z #define MOD_NANO ADJ_NANO 2025-05-07T19:46:54.5608827Z #define MOD_OFFSET ADJ_OFFSET 2025-05-07T19:46:54.5609114Z #define MOD_STATUS ADJ_STATUS 2025-05-07T19:46:54.5609421Z #define MOD_TAI ADJ_TAI 2025-05-07T19:46:54.5609698Z #define MOD_TIMECONST ADJ_TIMECONST 2025-05-07T19:46:54.5610133Z #define MQ_PRIO_MAX 32768 2025-05-07T19:46:54.5610438Z #define M_1_PI 0.31830988618379067154 2025-05-07T19:46:54.5610776Z #define M_1_PIl 0.318309886183790671537767526745028724L 2025-05-07T19:46:54.5611161Z #define M_2_PI 0.63661977236758134308 2025-05-07T19:46:54.5611500Z #define M_2_PIl 0.636619772367581343075535053490057448L 2025-05-07T19:46:54.5611895Z #define M_2_SQRTPI 1.12837916709551257390 2025-05-07T19:46:54.5612269Z #define M_2_SQRTPIl 1.128379167095512573896158903121545172L 2025-05-07T19:46:54.5612673Z #define M_E 2.7182818284590452354 2025-05-07T19:46:54.5613004Z #define M_El 2.718281828459045235360287471352662498L 2025-05-07T19:46:54.5613379Z #define M_LN10 2.30258509299404568402 2025-05-07T19:46:54.5613762Z #define M_LN10l 2.302585092994045684017991454684364208L 2025-05-07T19:46:54.5614126Z #define M_LN2 0.69314718055994530942 2025-05-07T19:46:54.5614502Z #define M_LN2l 0.693147180559945309417232121458176568L 2025-05-07T19:46:54.5614864Z #define M_LOG10E 0.43429448190325182765 2025-05-07T19:46:54.5615269Z #define M_LOG10El 0.434294481903251827651128918916605082L 2025-05-07T19:46:54.5615638Z #define M_LOG2E 1.4426950408889634074 2025-05-07T19:46:54.5616019Z #define M_LOG2El 1.442695040888963407359924681001892137L 2025-05-07T19:46:54.5616381Z #define M_PI 3.14159265358979323846 2025-05-07T19:46:54.5616714Z #define M_PI_2 1.57079632679489661923 2025-05-07T19:46:54.5617083Z #define M_PI_2l 1.570796326794896619231321691639751442L 2025-05-07T19:46:54.5617539Z #define M_PI_4 0.78539816339744830962 2025-05-07T19:46:54.5617923Z #define M_PI_4l 0.785398163397448309615660845819875721L 2025-05-07T19:46:54.5618374Z #define M_PIl 3.141592653589793238462643383279502884L 2025-05-07T19:46:54.5618772Z #define M_SQRT1_2 0.70710678118654752440 2025-05-07T19:46:54.5619146Z #define M_SQRT1_2l 0.707106781186547524400844362104849039L 2025-05-07T19:46:54.5619547Z #define M_SQRT2 1.41421356237309504880 2025-05-07T19:46:54.5619900Z #define M_SQRT2l 1.414213562373095048801688724209698079L 2025-05-07T19:46:54.5620375Z #define NAME_MAX 255 2025-05-07T19:46:54.5620688Z #define NAN (__builtin_nanf ("")) 2025-05-07T19:46:54.5620998Z #define NFDBITS __NFDBITS 2025-05-07T19:46:54.5621309Z #define NGROUPS_MAX 65536 2025-05-07T19:46:54.5621599Z #define NL_ARGMAX _POSIX_ARG_MAX 2025-05-07T19:46:54.5621945Z #define NL_LANGMAX _POSIX2_LINE_MAX 2025-05-07T19:46:54.5622263Z #define NL_MSGMAX INT_MAX 2025-05-07T19:46:54.5622571Z #define NL_NMAX INT_MAX 2025-05-07T19:46:54.5622839Z #define NL_SETMAX INT_MAX 2025-05-07T19:46:54.5623148Z #define NL_TEXTMAX INT_MAX 2025-05-07T19:46:54.5623425Z #define NULL __null 2025-05-07T19:46:54.5623812Z #define NZERO 20 2025-05-07T19:46:54.5624069Z #define OVERFLOW 3 2025-05-07T19:46:54.5624314Z #define PATH_MAX 4096 2025-05-07T19:46:54.5624622Z #define PDP_ENDIAN __PDP_ENDIAN 2025-05-07T19:46:54.5624911Z #define PIPE_BUF 4096 2025-05-07T19:46:54.5625188Z #define PLOSS 6 2025-05-07T19:46:54.5625561Z #define PTHREAD_DESTRUCTOR_ITERATIONS _POSIX_THREAD_DESTRUCTOR_ITERATIONS 2025-05-07T19:46:54.5626057Z #define PTHREAD_KEYS_MAX 1024 2025-05-07T19:46:54.5626347Z #define PTHREAD_STACK_MIN 16384 2025-05-07T19:46:54.5626661Z #define P_tmpdir "/tmp" 2025-05-07T19:46:54.5626923Z #define RAND_MAX 2147483647 2025-05-07T19:46:54.5627224Z #define RE_DUP_MAX (0x7fff) 2025-05-07T19:46:54.5627515Z #define RTSIG_MAX 32 2025-05-07T19:46:54.5627770Z #define SCHAR_MAX __SCHAR_MAX__ 2025-05-07T19:46:54.5628089Z #define SCHAR_MIN (-__SCHAR_MAX__-1) 2025-05-07T19:46:54.5628389Z #define SEEK_CUR 1 2025-05-07T19:46:54.5628658Z #define SEEK_DATA 3 2025-05-07T19:46:54.5628893Z #define SEEK_END 2 2025-05-07T19:46:54.5629155Z #define SEEK_HOLE 4 2025-05-07T19:46:54.5629393Z #define SEEK_SET 0 2025-05-07T19:46:54.5629668Z #define SEM_VALUE_MAX (2147483647) 2025-05-07T19:46:54.5629969Z #define SHRT_MAX __SHRT_MAX__ 2025-05-07T19:46:54.5630295Z #define SHRT_MIN (-__SHRT_MAX__ -1) 2025-05-07T19:46:54.5630597Z #define SING 2 2025-05-07T19:46:54.5630862Z #define SSIZE_MAX LONG_MAX 2025-05-07T19:46:54.5631253Z #define STA_CLK 0x8000 2025-05-07T19:46:54.5631533Z #define STA_CLOCKERR 0x1000 2025-05-07T19:46:54.5631834Z #define STA_DEL 0x0020 2025-05-07T19:46:54.5632089Z #define STA_FLL 0x0008 2025-05-07T19:46:54.5632381Z #define STA_FREQHOLD 0x0080 2025-05-07T19:46:54.5632655Z #define STA_INS 0x0010 2025-05-07T19:46:54.5632946Z #define STA_MODE 0x4000 2025-05-07T19:46:54.5633211Z #define STA_NANO 0x2000 2025-05-07T19:46:54.5633492Z #define STA_PLL 0x0001 2025-05-07T19:46:54.5633754Z #define STA_PPSERROR 0x0800 2025-05-07T19:46:54.5634062Z #define STA_PPSFREQ 0x0002 2025-05-07T19:46:54.5634376Z #define STA_PPSJITTER 0x0200 2025-05-07T19:46:54.5634663Z #define STA_PPSSIGNAL 0x0100 2025-05-07T19:46:54.5634972Z #define STA_PPSTIME 0x0004 2025-05-07T19:46:54.5635250Z #define STA_PPSWANDER 0x0400 2025-05-07T19:46:54.5635860Z #define STA_RONLY (STA_PPSSIGNAL | STA_PPSJITTER | STA_PPSWANDER | STA_PPSERROR | STA_CLOCKERR | STA_NANO | STA_MODE | STA_CLK) 2025-05-07T19:46:54.5636479Z #define STA_UNSYNC 0x0040 2025-05-07T19:46:54.5636780Z #define TIMER_ABSTIME 1 2025-05-07T19:46:54.5637046Z #define TIME_UTC 1 2025-05-07T19:46:54.5637306Z #define TLOSS 5 2025-05-07T19:46:54.5637537Z #define TMP_MAX 238328 2025-05-07T19:46:54.5637815Z #define TTY_NAME_MAX 32 2025-05-07T19:46:54.5638112Z #define UCHAR_MAX (__SCHAR_MAX__*2 +1) 2025-05-07T19:46:54.5638436Z #define UINT_MAX (__INT_MAX__ *2U +1U) 2025-05-07T19:46:54.5638805Z #define ULLONG_MAX (__LONG_LONG_MAX__*2ULL+1ULL) 2025-05-07T19:46:54.5639190Z #define ULONG_LONG_MAX (__LONG_LONG_MAX__*2ULL+1ULL) 2025-05-07T19:46:54.5639586Z #define ULONG_MAX (__LONG_MAX__ *2UL+1UL) 2025-05-07T19:46:54.5639895Z #define UNDERFLOW 4 2025-05-07T19:46:54.5640180Z #define USHRT_MAX (__SHRT_MAX__ *2 +1) 2025-05-07T19:46:54.5640482Z #define WCONTINUED 8 2025-05-07T19:46:54.5640757Z #define WEXITED 4 2025-05-07T19:46:54.5641088Z #define WEXITSTATUS(status) __WEXITSTATUS (__WAIT_INT (status)) 2025-05-07T19:46:54.5641616Z #define WIFCONTINUED(status) __WIFCONTINUED (__WAIT_INT (status)) 2025-05-07T19:46:54.5642202Z #define WIFEXITED(status) __WIFEXITED (__WAIT_INT (status)) 2025-05-07T19:46:54.5642675Z #define WIFSIGNALED(status) __WIFSIGNALED (__WAIT_INT (status)) 2025-05-07T19:46:54.5643187Z #define WIFSTOPPED(status) __WIFSTOPPED (__WAIT_INT (status)) 2025-05-07T19:46:54.5643570Z #define WNOHANG 1 2025-05-07T19:46:54.5643847Z #define WNOWAIT 0x01000000 2025-05-07T19:46:54.5644112Z #define WORD_BIT 32 2025-05-07T19:46:54.5644385Z #define WSTOPPED 2 2025-05-07T19:46:54.5644696Z #define WSTOPSIG(status) __WSTOPSIG (__WAIT_INT (status)) 2025-05-07T19:46:54.5645161Z #define WTERMSIG(status) __WTERMSIG (__WAIT_INT (status)) 2025-05-07T19:46:54.5645566Z #define WUNTRACED 2 2025-05-07T19:46:54.5645816Z #define XATTR_LIST_MAX 65536 2025-05-07T19:46:54.5646127Z #define XATTR_NAME_MAX 255 2025-05-07T19:46:54.5646404Z #define XATTR_SIZE_MAX 65536 2025-05-07T19:46:54.5646725Z #define X_TLOSS 1.41484755040568800000e+16 2025-05-07T19:46:54.5647039Z #define _ACRTIMP 2025-05-07T19:46:54.5647305Z #define _ALLOCA_H 1 2025-05-07T19:46:54.5647550Z #define _ASSERT_H 1 2025-05-07T19:46:54.5647829Z #define _ATFILE_SOURCE 1 2025-05-07T19:46:54.5648102Z #define _BITS_BYTESWAP_H 1 2025-05-07T19:46:54.5648408Z #define _BITS_POSIX1_LIM_H 1 2025-05-07T19:46:54.5648725Z #define _BITS_POSIX2_LIM_H 1 2025-05-07T19:46:54.5649013Z #define _BITS_PTHREADTYPES_H 1 2025-05-07T19:46:54.5649327Z #define _BITS_TIMEX_H 1 2025-05-07T19:46:54.5649580Z #define _BITS_TIME_H 1 2025-05-07T19:46:54.5649840Z #define _BITS_TYPESIZES_H 1 2025-05-07T19:46:54.5650092Z #define _BITS_TYPES_H 1 2025-05-07T19:46:54.5650336Z #define _BSD_SOURCE 1 2025-05-07T19:46:54.5650569Z #define _CONCEPT_CHECK_H 1 2025-05-07T19:46:54.5650840Z #define _CPP_TYPE_TRAITS_H 1 2025-05-07T19:46:54.5651090Z #define _CRTIMP 2025-05-07T19:46:54.5651318Z #define _CTYPE_H 1 2025-05-07T19:46:54.5651530Z #define _ENDIAN_H 1 2025-05-07T19:46:54.5651775Z #define _EXCEPTION_DEFINES_H 1 2025-05-07T19:46:54.5652061Z #define _EXT_NUMERIC_TRAITS 1 2025-05-07T19:46:54.5652326Z #define _EXT_TYPE_TRAITS 1 2025-05-07T19:46:54.5652688Z #define _FEATURES_H 1 2025-05-07T19:46:54.5652924Z #define _FUNCTEXCEPT_H 1 2025-05-07T19:46:54.5653186Z #define _GCC_LIMITS_H_ 2025-05-07T19:46:54.5653468Z #define _GLIBCXX11_DEPRECATED _GLIBCXX_DEPRECATED 2025-05-07T19:46:54.5653954Z #define _GLIBCXX11_DEPRECATED_SUGGEST(ALT) _GLIBCXX_DEPRECATED_SUGGEST(ALT) 2025-05-07T19:46:54.5654398Z #define _GLIBCXX11_USE_C99_COMPLEX 1 2025-05-07T19:46:54.5654705Z #define _GLIBCXX11_USE_C99_MATH 1 2025-05-07T19:46:54.5654982Z #define _GLIBCXX11_USE_C99_STDIO 1 2025-05-07T19:46:54.5655286Z #define _GLIBCXX11_USE_C99_STDLIB 1 2025-05-07T19:46:54.5655582Z #define _GLIBCXX11_USE_C99_WCHAR 1 2025-05-07T19:46:54.5655869Z #define _GLIBCXX14_CONSTEXPR constexpr 2025-05-07T19:46:54.5656184Z #define _GLIBCXX17_CONSTEXPR constexpr 2025-05-07T19:46:54.5656505Z #define _GLIBCXX17_DEPRECATED [[__deprecated__]] 2025-05-07T19:46:54.5656972Z #define _GLIBCXX17_DEPRECATED_SUGGEST(ALT) _GLIBCXX_DEPRECATED_SUGGEST(ALT) 2025-05-07T19:46:54.5657487Z #define _GLIBCXX17_INLINE inline 2025-05-07T19:46:54.5657952Z #define _GLIBCXX20_CONSTEXPR 2025-05-07T19:46:54.5658230Z #define _GLIBCXX20_DEPRECATED(MSG) 2025-05-07T19:46:54.5658607Z #define _GLIBCXX20_DEPRECATED_SUGGEST(ALT) 2025-05-07T19:46:54.5658938Z #define _GLIBCXX98_USE_C99_COMPLEX 1 2025-05-07T19:46:54.5659231Z #define _GLIBCXX98_USE_C99_MATH 1 2025-05-07T19:46:54.5659523Z #define _GLIBCXX98_USE_C99_STDIO 1 2025-05-07T19:46:54.5659806Z #define _GLIBCXX98_USE_C99_STDLIB 1 2025-05-07T19:46:54.5660105Z #define _GLIBCXX98_USE_C99_WCHAR 1 2025-05-07T19:46:54.5660476Z #define _GLIBCXX_ABI_TAG_CXX11 __attribute ((__abi_tag__ ("cxx11"))) 2025-05-07T19:46:54.5660894Z #define _GLIBCXX_ATOMIC_BUILTINS 1 2025-05-07T19:46:54.5661199Z #define _GLIBCXX_BEGIN_EXTERN_C extern "C" { 2025-05-07T19:46:54.5661539Z #define _GLIBCXX_BEGIN_NAMESPACE_ALGO 2025-05-07T19:46:54.5661873Z #define _GLIBCXX_BEGIN_NAMESPACE_CONTAINER 2025-05-07T19:46:54.5662256Z #define _GLIBCXX_BEGIN_NAMESPACE_CXX11 namespace __cxx11 { 2025-05-07T19:46:54.5662732Z #define _GLIBCXX_BEGIN_NAMESPACE_LDBL 2025-05-07T19:46:54.5663174Z #define _GLIBCXX_BEGIN_NAMESPACE_LDBL_OR_CXX11 _GLIBCXX_BEGIN_NAMESPACE_CXX11 2025-05-07T19:46:54.5663692Z #define _GLIBCXX_BEGIN_NAMESPACE_VERSION 2025-05-07T19:46:54.5664037Z #define _GLIBCXX_BITS_SPECFUN_H 1 2025-05-07T19:46:54.5664390Z #define _GLIBCXX_BITS_STD_ABS_H 2025-05-07T19:46:54.5664691Z #define _GLIBCXX_CMATH 1 2025-05-07T19:46:54.5665043Z #define _GLIBCXX_CONST __attribute__ ((__const__)) 2025-05-07T19:46:54.5665450Z #define _GLIBCXX_CONSTEXPR constexpr 2025-05-07T19:46:54.5665772Z #define _GLIBCXX_CPU_DEFINES 1 2025-05-07T19:46:54.5666074Z #define _GLIBCXX_CSTDLIB 1 2025-05-07T19:46:54.5666330Z #define _GLIBCXX_CXX_CONFIG_H 1 2025-05-07T19:46:54.5666631Z #define _GLIBCXX_DARWIN_USE_64_BIT_INODE 1 2025-05-07T19:46:54.5666956Z #define _GLIBCXX_DEBUG_ASSERT(_Condition) 2025-05-07T19:46:54.5667289Z #define _GLIBCXX_DEBUG_ASSERTIONS_H 1 2025-05-07T19:46:54.5668314Z #define _GLIBCXX_DEBUG_MACRO_SWITCH_H 1 2025-05-07T19:46:54.5668662Z #define _GLIBCXX_DEBUG_ONLY(_Statement) 2025-05-07T19:46:54.5668998Z #define _GLIBCXX_DEBUG_PEDASSERT(_Condition) 2025-05-07T19:46:54.5669393Z #define _GLIBCXX_DEFAULT_ABI_TAG _GLIBCXX_ABI_TAG_CXX11 2025-05-07T19:46:54.5669844Z #define _GLIBCXX_DEPRECATED __attribute__ ((__deprecated__)) 2025-05-07T19:46:54.5670432Z #define _GLIBCXX_DEPRECATED_SUGGEST(ALT) __attribute__ ((__deprecated__ ("use '" ALT "' instead"))) 2025-05-07T19:46:54.5670985Z #define _GLIBCXX_DOUBLE_IS_IEEE_BINARY64 1 2025-05-07T19:46:54.5671297Z #define _GLIBCXX_END_EXTERN_C } 2025-05-07T19:46:54.5671593Z #define _GLIBCXX_END_NAMESPACE_ALGO 2025-05-07T19:46:54.5671899Z #define _GLIBCXX_END_NAMESPACE_CONTAINER 2025-05-07T19:46:54.5672233Z #define _GLIBCXX_END_NAMESPACE_CXX11 } 2025-05-07T19:46:54.5672550Z #define _GLIBCXX_END_NAMESPACE_LDBL 2025-05-07T19:46:54.5672958Z #define _GLIBCXX_END_NAMESPACE_LDBL_OR_CXX11 _GLIBCXX_END_NAMESPACE_CXX11 2025-05-07T19:46:54.5673409Z #define _GLIBCXX_END_NAMESPACE_VERSION 2025-05-07T19:46:54.5673883Z #define _GLIBCXX_EXTERN_TEMPLATE 1 2025-05-07T19:46:54.5674208Z #define _GLIBCXX_FAST_MATH 0 2025-05-07T19:46:54.5674501Z #define _GLIBCXX_FLOAT_IS_IEEE_BINARY32 1 2025-05-07T19:46:54.5674958Z #define _GLIBCXX_FORWARD(_Tp,__val) std::forward<_Tp>(__val) 2025-05-07T19:46:54.5675352Z #define _GLIBCXX_FULLY_DYNAMIC_STRING 0 2025-05-07T19:46:54.5675675Z #define _GLIBCXX_FWDREF(_Tp) _Tp&& 2025-05-07T19:46:54.5675960Z #define _GLIBCXX_HAS_GTHREADS 1 2025-05-07T19:46:54.5676896Z #define _GLIBCXX_HAS_NESTED_TYPE(_NTYPE) template> struct __has_##_NTYPE : false_type { }; template struct __has_##_NTYPE<_Tp, __void_t> : true_type { }; 2025-05-07T19:46:54.5677966Z #define _GLIBCXX_HAVE_ACOSF 1 2025-05-07T19:46:54.5678235Z #define _GLIBCXX_HAVE_ACOSL 1 2025-05-07T19:46:54.5678527Z #define _GLIBCXX_HAVE_ALIGNED_ALLOC 1 2025-05-07T19:46:54.5678842Z #define _GLIBCXX_HAVE_ARPA_INET_H 1 2025-05-07T19:46:54.5679128Z #define _GLIBCXX_HAVE_ASINF 1 2025-05-07T19:46:54.5679527Z #define _GLIBCXX_HAVE_ASINL 1 2025-05-07T19:46:54.5679907Z #define _GLIBCXX_HAVE_AS_SYMVER_DIRECTIVE 1 2025-05-07T19:46:54.5680215Z #define _GLIBCXX_HAVE_ATAN2F 1 2025-05-07T19:46:54.5680463Z #define _GLIBCXX_HAVE_ATAN2L 1 2025-05-07T19:46:54.5680725Z #define _GLIBCXX_HAVE_ATANF 1 2025-05-07T19:46:54.5680972Z #define _GLIBCXX_HAVE_ATANL 1 2025-05-07T19:46:54.5681250Z #define _GLIBCXX_HAVE_ATOMIC_LOCK_POLICY 1 2025-05-07T19:46:54.5681564Z #define _GLIBCXX_HAVE_ATTRIBUTE_VISIBILITY 1 2025-05-07T19:46:54.5681884Z #define _GLIBCXX_HAVE_AT_QUICK_EXIT 1 2025-05-07T19:46:54.5682206Z #define _GLIBCXX_HAVE_BUILTIN_HAS_UNIQ_OBJ_REP 1 2025-05-07T19:46:54.5682533Z #define _GLIBCXX_HAVE_BUILTIN_IS_AGGREGATE 1 2025-05-07T19:46:54.5682886Z #define _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED 1 2025-05-07T19:46:54.5683219Z #define _GLIBCXX_HAVE_BUILTIN_IS_SAME 1 2025-05-07T19:46:54.5683522Z #define _GLIBCXX_HAVE_BUILTIN_LAUNDER 1 2025-05-07T19:46:54.5683795Z #define _GLIBCXX_HAVE_CEILF 1 2025-05-07T19:46:54.5684173Z #define _GLIBCXX_HAVE_CEILL 1 2025-05-07T19:46:54.5684429Z #define _GLIBCXX_HAVE_COMPLEX_H 1 2025-05-07T19:46:54.5684708Z #define _GLIBCXX_HAVE_COSF 1 2025-05-07T19:46:54.5684975Z #define _GLIBCXX_HAVE_COSHF 1 2025-05-07T19:46:54.5685226Z #define _GLIBCXX_HAVE_COSHL 1 2025-05-07T19:46:54.5685492Z #define _GLIBCXX_HAVE_COSL 1 2025-05-07T19:46:54.5685746Z #define _GLIBCXX_HAVE_DIRENT_H 1 2025-05-07T19:46:54.5686023Z #define _GLIBCXX_HAVE_DLFCN_H 1 2025-05-07T19:46:54.5686281Z #define _GLIBCXX_HAVE_ENDIAN_H 1 2025-05-07T19:46:54.5686589Z #define _GLIBCXX_HAVE_EXCEPTION_PTR_SINCE_GCC46 1 2025-05-07T19:46:54.5686911Z #define _GLIBCXX_HAVE_EXECINFO_H 1 2025-05-07T19:46:54.5687197Z #define _GLIBCXX_HAVE_EXPF 1 2025-05-07T19:46:54.5687446Z #define _GLIBCXX_HAVE_EXPL 1 2025-05-07T19:46:54.5687713Z #define _GLIBCXX_HAVE_FABSF 1 2025-05-07T19:46:54.5687988Z #define _GLIBCXX_HAVE_FABSL 1 2025-05-07T19:46:54.5688238Z #define _GLIBCXX_HAVE_FCNTL_H 1 2025-05-07T19:46:54.5688511Z #define _GLIBCXX_HAVE_FENV_H 1 2025-05-07T19:46:54.5688773Z #define _GLIBCXX_HAVE_FINITE 1 2025-05-07T19:46:54.5689043Z #define _GLIBCXX_HAVE_FINITEF 1 2025-05-07T19:46:54.5689297Z #define _GLIBCXX_HAVE_FINITEL 1 2025-05-07T19:46:54.5689568Z #define _GLIBCXX_HAVE_FLOAT_H 1 2025-05-07T19:46:54.5689819Z #define _GLIBCXX_HAVE_FLOORF 1 2025-05-07T19:46:54.5690088Z #define _GLIBCXX_HAVE_FLOORL 1 2025-05-07T19:46:54.5690340Z #define _GLIBCXX_HAVE_FMODF 1 2025-05-07T19:46:54.5690604Z #define _GLIBCXX_HAVE_FMODL 1 2025-05-07T19:46:54.5690866Z #define _GLIBCXX_HAVE_FREXPF 1 2025-05-07T19:46:54.5691116Z #define _GLIBCXX_HAVE_FREXPL 1 2025-05-07T19:46:54.5691379Z #define _GLIBCXX_HAVE_GETIPINFO 1 2025-05-07T19:46:54.5691643Z #define _GLIBCXX_HAVE_GETS 1 2025-05-07T19:46:54.5691901Z #define _GLIBCXX_HAVE_HYPOT 1 2025-05-07T19:46:54.5692146Z #define _GLIBCXX_HAVE_HYPOTF 1 2025-05-07T19:46:54.5692404Z #define _GLIBCXX_HAVE_HYPOTL 1 2025-05-07T19:46:54.5692651Z #define _GLIBCXX_HAVE_ICONV 1 2025-05-07T19:46:54.5692907Z #define _GLIBCXX_HAVE_INT64_T 1 2025-05-07T19:46:54.5693241Z #define _GLIBCXX_HAVE_INT64_T_LONG 1 2025-05-07T19:46:54.5693531Z #define _GLIBCXX_HAVE_INTTYPES_H 1 2025-05-07T19:46:54.5693823Z #define _GLIBCXX_HAVE_ISINF 1 2025-05-07T19:46:54.5694071Z #define _GLIBCXX_HAVE_ISINFF 1 2025-05-07T19:46:54.5694333Z #define _GLIBCXX_HAVE_ISINFL 1 2025-05-07T19:46:54.5694582Z #define _GLIBCXX_HAVE_ISNAN 1 2025-05-07T19:46:54.5694842Z #define _GLIBCXX_HAVE_ISNANF 1 2025-05-07T19:46:54.5695089Z #define _GLIBCXX_HAVE_ISNANL 1 2025-05-07T19:46:54.5695355Z #define _GLIBCXX_HAVE_ISWBLANK 1 2025-05-07T19:46:54.5695618Z #define _GLIBCXX_HAVE_LC_MESSAGES 1 2025-05-07T19:46:54.5695901Z #define _GLIBCXX_HAVE_LDEXPF 1 2025-05-07T19:46:54.5696150Z #define _GLIBCXX_HAVE_LDEXPL 1 2025-05-07T19:46:54.5696420Z #define _GLIBCXX_HAVE_LIMIT_AS 1 2025-05-07T19:46:54.5696694Z #define _GLIBCXX_HAVE_LIMIT_DATA 1 2025-05-07T19:46:54.5696966Z #define _GLIBCXX_HAVE_LIMIT_FSIZE 1 2025-05-07T19:46:54.5697331Z #define _GLIBCXX_HAVE_LIMIT_RSS 1 2025-05-07T19:46:54.5697774Z #define _GLIBCXX_HAVE_LIMIT_VMEM 0 2025-05-07T19:46:54.5698080Z #define _GLIBCXX_HAVE_LINK 1 2025-05-07T19:46:54.5698351Z #define _GLIBCXX_HAVE_LINUX_FUTEX 1 2025-05-07T19:46:54.5698665Z #define _GLIBCXX_HAVE_LINUX_RANDOM_H 1 2025-05-07T19:46:54.5698969Z #define _GLIBCXX_HAVE_LINUX_TYPES_H 1 2025-05-07T19:46:54.5699281Z #define _GLIBCXX_HAVE_LOCALE_H 1 2025-05-07T19:46:54.5699558Z #define _GLIBCXX_HAVE_LOG10F 1 2025-05-07T19:46:54.5699843Z #define _GLIBCXX_HAVE_LOG10L 1 2025-05-07T19:46:54.5700127Z #define _GLIBCXX_HAVE_LOGF 1 2025-05-07T19:46:54.5700388Z #define _GLIBCXX_HAVE_LOGL 1 2025-05-07T19:46:54.5700668Z #define _GLIBCXX_HAVE_MBSTATE_T 1 2025-05-07T19:46:54.5700950Z #define _GLIBCXX_HAVE_MEMALIGN 1 2025-05-07T19:46:54.5701254Z #define _GLIBCXX_HAVE_MEMORY_H 1 2025-05-07T19:46:54.5701524Z #define _GLIBCXX_HAVE_MODF 1 2025-05-07T19:46:54.5701801Z #define _GLIBCXX_HAVE_MODFF 1 2025-05-07T19:46:54.5702066Z #define _GLIBCXX_HAVE_MODFL 1 2025-05-07T19:46:54.5702345Z #define _GLIBCXX_HAVE_NETDB_H 1 2025-05-07T19:46:54.5702705Z #define _GLIBCXX_HAVE_NETINET_IN_H 1 2025-05-07T19:46:54.5703019Z #define _GLIBCXX_HAVE_NETINET_TCP_H 1 2025-05-07T19:46:54.5703339Z #define _GLIBCXX_HAVE_OBSOLETE_ISINF 1 2025-05-07T19:46:54.5703657Z #define _GLIBCXX_HAVE_OBSOLETE_ISNAN 1 2025-05-07T19:46:54.5703968Z #define _GLIBCXX_HAVE_POLL 1 2025-05-07T19:46:54.5704235Z #define _GLIBCXX_HAVE_POLL_H 1 2025-05-07T19:46:54.5704531Z #define _GLIBCXX_HAVE_POSIX_MEMALIGN 1 2025-05-07T19:46:54.5704837Z #define _GLIBCXX_HAVE_POSIX_SEMAPHORE 1 2025-05-07T19:46:54.5705152Z #define _GLIBCXX_HAVE_POWF 1 2025-05-07T19:46:54.5705415Z #define _GLIBCXX_HAVE_POWL 1 2025-05-07T19:46:54.5705697Z #define _GLIBCXX_HAVE_QUICK_EXIT 1 2025-05-07T19:46:54.5705983Z #define _GLIBCXX_HAVE_READLINK 1 2025-05-07T19:46:54.5706271Z #define _GLIBCXX_HAVE_SETENV 1 2025-05-07T19:46:54.5706551Z #define _GLIBCXX_HAVE_SINCOS 1 2025-05-07T19:46:54.5706824Z #define _GLIBCXX_HAVE_SINCOSF 1 2025-05-07T19:46:54.5707107Z #define _GLIBCXX_HAVE_SINCOSL 1 2025-05-07T19:46:54.5707376Z #define _GLIBCXX_HAVE_SINF 1 2025-05-07T19:46:54.5707655Z #define _GLIBCXX_HAVE_SINHF 1 2025-05-07T19:46:54.5707921Z #define _GLIBCXX_HAVE_SINHL 1 2025-05-07T19:46:54.5708198Z #define _GLIBCXX_HAVE_SINL 1 2025-05-07T19:46:54.5708466Z #define _GLIBCXX_HAVE_SOCKATMARK 1 2025-05-07T19:46:54.5708764Z #define _GLIBCXX_HAVE_SQRTF 1 2025-05-07T19:46:54.5709030Z #define _GLIBCXX_HAVE_SQRTL 1 2025-05-07T19:46:54.5709331Z #define _GLIBCXX_HAVE_STDALIGN_H 1 2025-05-07T19:46:54.5709630Z #define _GLIBCXX_HAVE_STDBOOL_H 1 2025-05-07T19:46:54.5710012Z #define _GLIBCXX_HAVE_STDINT_H 1 2025-05-07T19:46:54.5710289Z #define _GLIBCXX_HAVE_STDLIB_H 1 2025-05-07T19:46:54.5710551Z #define _GLIBCXX_HAVE_STRERROR_L 1 2025-05-07T19:46:54.5710834Z #define _GLIBCXX_HAVE_STRERROR_R 1 2025-05-07T19:46:54.5711102Z #define _GLIBCXX_HAVE_STRINGS_H 1 2025-05-07T19:46:54.5711379Z #define _GLIBCXX_HAVE_STRING_H 1 2025-05-07T19:46:54.5711633Z #define _GLIBCXX_HAVE_STRTOF 1 2025-05-07T19:46:54.5711901Z #define _GLIBCXX_HAVE_STRTOLD 1 2025-05-07T19:46:54.5712256Z #define _GLIBCXX_HAVE_STRUCT_DIRENT_D_TYPE 1 2025-05-07T19:46:54.5712570Z #define _GLIBCXX_HAVE_STRXFRM_L 1 2025-05-07T19:46:54.5712851Z #define _GLIBCXX_HAVE_SYMLINK 1 2025-05-07T19:46:54.5713179Z #define _GLIBCXX_HAVE_SYMVER_SYMBOL_RENAMING_RUNTIME_SUPPORT 1 2025-05-07T19:46:54.5713578Z #define _GLIBCXX_HAVE_SYS_IOCTL_H 1 2025-05-07T19:46:54.5713851Z #define _GLIBCXX_HAVE_SYS_IPC_H 1 2025-05-07T19:46:54.5714132Z #define _GLIBCXX_HAVE_SYS_PARAM_H 1 2025-05-07T19:46:54.5714410Z #define _GLIBCXX_HAVE_SYS_RESOURCE_H 1 2025-05-07T19:46:54.5714705Z #define _GLIBCXX_HAVE_SYS_SEM_H 1 2025-05-07T19:46:54.5714976Z #define _GLIBCXX_HAVE_SYS_SOCKET_H 1 2025-05-07T19:46:54.5715291Z #define _GLIBCXX_HAVE_SYS_STATVFS_H 1 2025-05-07T19:46:54.5715597Z #define _GLIBCXX_HAVE_SYS_STAT_H 1 2025-05-07T19:46:54.5715876Z #define _GLIBCXX_HAVE_SYS_SYSINFO_H 1 2025-05-07T19:46:54.5716183Z #define _GLIBCXX_HAVE_SYS_TIME_H 1 2025-05-07T19:46:54.5716457Z #define _GLIBCXX_HAVE_SYS_TYPES_H 1 2025-05-07T19:46:54.5717096Z #define _GLIBCXX_HAVE_SYS_UIO_H 1 2025-05-07T19:46:54.5717382Z #define _GLIBCXX_HAVE_S_ISREG 1 2025-05-07T19:46:54.5717662Z #define _GLIBCXX_HAVE_TANF 1 2025-05-07T19:46:54.5717924Z #define _GLIBCXX_HAVE_TANHF 1 2025-05-07T19:46:54.5718192Z #define _GLIBCXX_HAVE_TANHL 1 2025-05-07T19:46:54.5718460Z #define _GLIBCXX_HAVE_TANL 1 2025-05-07T19:46:54.5718717Z #define _GLIBCXX_HAVE_TGMATH_H 1 2025-05-07T19:46:54.5718984Z #define _GLIBCXX_HAVE_TLS 1 2025-05-07T19:46:54.5719227Z #define _GLIBCXX_HAVE_TRUNCATE 1 2025-05-07T19:46:54.5719500Z #define _GLIBCXX_HAVE_UNISTD_H 1 2025-05-07T19:46:54.5719755Z #define _GLIBCXX_HAVE_USELOCALE 1 2025-05-07T19:46:54.5720032Z #define _GLIBCXX_HAVE_UTIME_H 1 2025-05-07T19:46:54.5720284Z #define _GLIBCXX_HAVE_VFWSCANF 1 2025-05-07T19:46:54.5720554Z #define _GLIBCXX_HAVE_VSWSCANF 1 2025-05-07T19:46:54.5720805Z #define _GLIBCXX_HAVE_VWSCANF 1 2025-05-07T19:46:54.5721067Z #define _GLIBCXX_HAVE_WCHAR_H 1 2025-05-07T19:46:54.5721331Z #define _GLIBCXX_HAVE_WCSTOF 1 2025-05-07T19:46:54.5721680Z #define _GLIBCXX_HAVE_WCTYPE_H 1 2025-05-07T19:46:54.5721953Z #define _GLIBCXX_HAVE_WRITEV 1 2025-05-07T19:46:54.5722205Z #define _GLIBCXX_HAVE_XLOCALE_H 1 2025-05-07T19:46:54.5722479Z #define _GLIBCXX_HOSTED 1 2025-05-07T19:46:54.5722714Z #define _GLIBCXX_ICONV_CONST 2025-05-07T19:46:54.5722983Z #define _GLIBCXX_INLINE_VERSION 0 2025-05-07T19:46:54.5723244Z #define _GLIBCXX_LT_OBJDIR ".libs/" 2025-05-07T19:46:54.5723738Z #define _GLIBCXX_MAKE_MOVE_IF_NOEXCEPT_ITERATOR(_Iter) std::__make_move_if_noexcept_iterator(_Iter) 2025-05-07T19:46:54.5724336Z #define _GLIBCXX_MAKE_MOVE_ITERATOR(_Iter) std::make_move_iterator(_Iter) 2025-05-07T19:46:54.5724754Z #define _GLIBCXX_MANGLE_SIZE_T m 2025-05-07T19:46:54.5725030Z #define _GLIBCXX_MATH_H 1 2025-05-07T19:46:54.5725293Z #define _GLIBCXX_MOVE(__val) std::move(__val) 2025-05-07T19:46:54.5725675Z #define _GLIBCXX_MOVE3(_Tp,_Up,_Vp) std::move(_Tp, _Up, _Vp) 2025-05-07T19:46:54.5726156Z #define _GLIBCXX_MOVE_BACKWARD3(_Tp,_Up,_Vp) std::move_backward(_Tp, _Up, _Vp) 2025-05-07T19:46:54.5726607Z #define _GLIBCXX_NAMESPACE_CXX11 __cxx11:: 2025-05-07T19:46:54.5726898Z #define _GLIBCXX_NAMESPACE_LDBL 2025-05-07T19:46:54.5727261Z #define _GLIBCXX_NAMESPACE_LDBL_OR_CXX11 _GLIBCXX_NAMESPACE_CXX11 2025-05-07T19:46:54.5727806Z #define _GLIBCXX_NATIVE_THREAD_ID (__gthread_active_p() ? __gthread_self() : (__gthread_t)1) 2025-05-07T19:46:54.5728285Z #define _GLIBCXX_NODISCARD [[__nodiscard__]] 2025-05-07T19:46:54.5728600Z #define _GLIBCXX_NOEXCEPT noexcept 2025-05-07T19:46:54.5728916Z #define _GLIBCXX_NOEXCEPT_IF(...) noexcept(__VA_ARGS__) 2025-05-07T19:46:54.5729272Z #define _GLIBCXX_NOEXCEPT_PARM , bool _NE 2025-05-07T19:46:54.5729581Z #define _GLIBCXX_NOEXCEPT_QUAL noexcept (_NE) 2025-05-07T19:46:54.5729945Z #define _GLIBCXX_NORETURN __attribute__ ((__noreturn__)) 2025-05-07T19:46:54.5730301Z #define _GLIBCXX_NOTHROW _GLIBCXX_USE_NOEXCEPT 2025-05-07T19:46:54.5730714Z #define _GLIBCXX_NO_OBSOLETE_ISINF_ISNAN_DYNAMIC __GLIBC_PREREQ(2,23) 2025-05-07T19:46:54.5731183Z #define _GLIBCXX_NUMERIC_LIMITS 1 2025-05-07T19:46:54.5731451Z #define _GLIBCXX_OS_DEFINES 1 2025-05-07T19:46:54.5731724Z #define _GLIBCXX_PACKAGE_BUGREPORT "" 2025-05-07T19:46:54.5732031Z #define _GLIBCXX_PACKAGE_NAME "package-unused" 2025-05-07T19:46:54.5732438Z #define _GLIBCXX_PACKAGE_STRING "package-unused version-unused" 2025-05-07T19:46:54.5732828Z #define _GLIBCXX_PACKAGE_TARNAME "libstdc++" 2025-05-07T19:46:54.5733142Z #define _GLIBCXX_PACKAGE_URL "" 2025-05-07T19:46:54.5733463Z #define _GLIBCXX_PACKAGE__GLIBCXX_VERSION "version-unused" 2025-05-07T19:46:54.5733826Z #define _GLIBCXX_PREDEFINED_OPS_H 1 2025-05-07T19:46:54.5734119Z #define _GLIBCXX_PSEUDO_VISIBILITY(V) 2025-05-07T19:46:54.5734424Z #define _GLIBCXX_PURE __attribute__ ((__pure__)) 2025-05-07T19:46:54.5734742Z #define _GLIBCXX_RELEASE 11 2025-05-07T19:46:54.5734987Z #define _GLIBCXX_RES_LIMITS 1 2025-05-07T19:46:54.5735248Z #define _GLIBCXX_STDC_HEADERS 1 2025-05-07T19:46:54.5735504Z #define _GLIBCXX_STDIO_EOF -1 2025-05-07T19:46:54.5735774Z #define _GLIBCXX_STDIO_SEEK_CUR 1 2025-05-07T19:46:54.5736041Z #define _GLIBCXX_STDIO_SEEK_END 2 2025-05-07T19:46:54.5736312Z #define _GLIBCXX_STDLIB_H 1 2025-05-07T19:46:54.5736550Z #define _GLIBCXX_STD_A std 2025-05-07T19:46:54.5736796Z #define _GLIBCXX_STD_C std 2025-05-07T19:46:54.5737041Z #define _GLIBCXX_SYMVER 1 2025-05-07T19:46:54.5737349Z #define _GLIBCXX_SYMVER_GNU 1 2025-05-07T19:46:54.5737830Z #define _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(A) 2025-05-07T19:46:54.5738275Z #define _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(A) 2025-05-07T19:46:54.5738634Z #define _GLIBCXX_THROW(_EXC) 2025-05-07T19:46:54.5738942Z #define _GLIBCXX_THROW_OR_ABORT(_EXC) (throw (_EXC)) 2025-05-07T19:46:54.5739314Z #define _GLIBCXX_TR1_BESSEL_FUNCTION_TCC 1 2025-05-07T19:46:54.5739639Z #define _GLIBCXX_TR1_BETA_FUNCTION_TCC 1 2025-05-07T19:46:54.5739964Z #define _GLIBCXX_TR1_ELL_INTEGRAL_TCC 1 2025-05-07T19:46:54.5740291Z #define _GLIBCXX_TR1_EXP_INTEGRAL_TCC 1 2025-05-07T19:46:54.5740590Z #define _GLIBCXX_TR1_GAMMA_TCC 1 2025-05-07T19:46:54.5740977Z #define _GLIBCXX_TR1_HYPERGEOMETRIC_TCC 1 2025-05-07T19:46:54.5741314Z #define _GLIBCXX_TR1_LEGENDRE_FUNCTION_TCC 1 2025-05-07T19:46:54.5741677Z #define _GLIBCXX_TR1_MODIFIED_BESSEL_FUNC_TCC 1 2025-05-07T19:46:54.5742016Z #define _GLIBCXX_TR1_POLY_HERMITE_TCC 1 2025-05-07T19:46:54.5742349Z #define _GLIBCXX_TR1_POLY_LAGUERRE_TCC 1 2025-05-07T19:46:54.5742663Z #define _GLIBCXX_TR1_RIEMANN_ZETA_TCC 1 2025-05-07T19:46:54.5743008Z #define _GLIBCXX_TR1_SPECIAL_FUNCTION_UTIL_H 1 2025-05-07T19:46:54.5743328Z #define _GLIBCXX_TXN_SAFE 2025-05-07T19:46:54.5743605Z #define _GLIBCXX_TXN_SAFE_DYN 2025-05-07T19:46:54.5743934Z #define _GLIBCXX_TYPE_TRAITS 1 2025-05-07T19:46:54.5744242Z #define _GLIBCXX_USE_ALLOCATOR_NEW 1 2025-05-07T19:46:54.5744594Z #define _GLIBCXX_USE_C99 1 2025-05-07T19:46:54.5744943Z #define _GLIBCXX_USE_C99_COMPLEX _GLIBCXX11_USE_C99_COMPLEX 2025-05-07T19:46:54.5745381Z #define _GLIBCXX_USE_C99_COMPLEX_TR1 1 2025-05-07T19:46:54.5745709Z #define _GLIBCXX_USE_C99_CTYPE_TR1 1 2025-05-07T19:46:54.5746022Z #define _GLIBCXX_USE_C99_FENV_TR1 1 2025-05-07T19:46:54.5746319Z #define _GLIBCXX_USE_C99_INTTYPES_TR1 1 2025-05-07T19:46:54.5746665Z #define _GLIBCXX_USE_C99_INTTYPES_WCHAR_T_TR1 1 2025-05-07T19:46:54.5747056Z #define _GLIBCXX_USE_C99_MATH _GLIBCXX11_USE_C99_MATH 2025-05-07T19:46:54.5747405Z #define _GLIBCXX_USE_C99_MATH_TR1 1 2025-05-07T19:46:54.5747722Z #define _GLIBCXX_USE_C99_STDINT_TR1 1 2025-05-07T19:46:54.5748072Z #define _GLIBCXX_USE_C99_STDIO _GLIBCXX11_USE_C99_STDIO 2025-05-07T19:46:54.5748503Z #define _GLIBCXX_USE_C99_STDLIB _GLIBCXX11_USE_C99_STDLIB 2025-05-07T19:46:54.5748916Z #define _GLIBCXX_USE_C99_WCHAR _GLIBCXX11_USE_C99_WCHAR 2025-05-07T19:46:54.5749297Z #define _GLIBCXX_USE_CLOCK_MONOTONIC 1 2025-05-07T19:46:54.5749609Z #define _GLIBCXX_USE_CLOCK_REALTIME 1 2025-05-07T19:46:54.5750037Z #define _GLIBCXX_USE_CONSTEXPR constexpr 2025-05-07T19:46:54.5750347Z #define _GLIBCXX_USE_CXX11_ABI 1 2025-05-07T19:46:54.5750613Z #define _GLIBCXX_USE_DECIMAL_FLOAT 1 2025-05-07T19:46:54.5750981Z #define _GLIBCXX_USE_DEPRECATED 1 2025-05-07T19:46:54.5751287Z #define _GLIBCXX_USE_DEV_RANDOM 1 2025-05-07T19:46:54.5751619Z #define _GLIBCXX_USE_DUAL_ABI 1 2025-05-07T19:46:54.5751904Z #define _GLIBCXX_USE_FCHMOD 1 2025-05-07T19:46:54.5752180Z #define _GLIBCXX_USE_FCHMODAT 1 2025-05-07T19:46:54.5752459Z #define _GLIBCXX_USE_FLOAT128 1 2025-05-07T19:46:54.5752784Z #define _GLIBCXX_USE_GETTIMEOFDAY 1 2025-05-07T19:46:54.5753094Z #define _GLIBCXX_USE_GET_NPROCS 1 2025-05-07T19:46:54.5753416Z #define _GLIBCXX_USE_INT128 1 2025-05-07T19:46:54.5753725Z #define _GLIBCXX_USE_LFS 1 2025-05-07T19:46:54.5753996Z #define _GLIBCXX_USE_LONG_LONG 1 2025-05-07T19:46:54.5754302Z #define _GLIBCXX_USE_LSTAT 1 2025-05-07T19:46:54.5754576Z #define _GLIBCXX_USE_NANOSLEEP 1 2025-05-07T19:46:54.5754888Z #define _GLIBCXX_USE_NOEXCEPT noexcept 2025-05-07T19:46:54.5755187Z #define _GLIBCXX_USE_PTHREAD_RWLOCK_T 1 2025-05-07T19:46:54.5755502Z #define _GLIBCXX_USE_RANDOM_TR1 1 2025-05-07T19:46:54.5755768Z #define _GLIBCXX_USE_REALPATH 1 2025-05-07T19:46:54.5756039Z #define _GLIBCXX_USE_SCHED_YIELD 1 2025-05-07T19:46:54.5756319Z #define _GLIBCXX_USE_SC_NPROCESSORS_ONLN 1 2025-05-07T19:46:54.5756625Z #define _GLIBCXX_USE_SENDFILE 1 2025-05-07T19:46:54.5756897Z #define _GLIBCXX_USE_STD_SPEC_FUNCS 1 2025-05-07T19:46:54.5757169Z #define _GLIBCXX_USE_ST_MTIM 1 2025-05-07T19:46:54.5757505Z #define _GLIBCXX_USE_TBB_PAR_BACKEND __has_include() 2025-05-07T19:46:54.5757865Z #define _GLIBCXX_USE_TMPNAM 1 2025-05-07T19:46:54.5758134Z #define _GLIBCXX_USE_UTIME 1 2025-05-07T19:46:54.5781072Z #define _GLIBCXX_USE_UTIMENSAT 1 2025-05-07T19:46:54.5781465Z #define _GLIBCXX_USE_WCHAR_T 1 2025-05-07T19:46:54.5781797Z #define _GLIBCXX_USE_WEAK_REF __GXX_WEAK__ 2025-05-07T19:46:54.5782171Z #define _GLIBCXX_UTILITY 1 2025-05-07T19:46:54.5782464Z #define _GLIBCXX_VERBOSE 1 2025-05-07T19:46:54.5782890Z #define _GLIBCXX_VISIBILITY(V) __attribute__ ((__visibility__ (#V))) 2025-05-07T19:46:54.5783369Z #define _GLIBCXX_WEAK_DEFINITION 2025-05-07T19:46:54.5783891Z #define _GLIBCXX_X86_RDRAND 1 2025-05-07T19:46:54.5784227Z #define _GLIBCXX_X86_RDSEED 1 2025-05-07T19:46:54.5784519Z #define _GNU_SOURCE 1 2025-05-07T19:46:54.5784832Z #define _GTHREAD_USE_MUTEX_TIMEDLOCK 1 2025-05-07T19:46:54.5785156Z #define _G_BUFSIZ 8192 2025-05-07T19:46:54.5785454Z #define _G_HAVE_MMAP 1 2025-05-07T19:46:54.5785725Z #define _G_HAVE_MREMAP 1 2025-05-07T19:46:54.5786099Z #define _G_HAVE_ST_BLKSIZE defined (_STATBUF_ST_BLKSIZE) 2025-05-07T19:46:54.5786499Z #define _G_IO_IO_FILE_VERSION 0x20001 2025-05-07T19:46:54.5786845Z #define _G_config_h 1 2025-05-07T19:46:54.5787150Z #define _G_va_list __gnuc_va_list 2025-05-07T19:46:54.5787465Z #define _INITIALIZER_LIST 2025-05-07T19:46:54.5787772Z #define _IOFBF 0 2025-05-07T19:46:54.5788008Z #define _IOLBF 1 2025-05-07T19:46:54.5788276Z #define _IONBF 2 2025-05-07T19:46:54.5788517Z #define _IOS_APPEND 8 2025-05-07T19:46:54.5788811Z #define _IOS_ATEND 4 2025-05-07T19:46:54.5789167Z #define _IOS_BIN 128 2025-05-07T19:46:54.5789448Z #define _IOS_INPUT 1 2025-05-07T19:46:54.5789703Z #define _IOS_NOCREATE 32 2025-05-07T19:46:54.5789999Z #define _IOS_NOREPLACE 64 2025-05-07T19:46:54.5790263Z #define _IOS_OUTPUT 2 2025-05-07T19:46:54.5790542Z #define _IOS_TRUNC 16 2025-05-07T19:46:54.5790826Z #define _IO_BAD_SEEN 0x4000 2025-05-07T19:46:54.5791162Z #define _IO_BE(expr,res) __builtin_expect ((expr), res) 2025-05-07T19:46:54.5791567Z #define _IO_BOOLALPHA 0200000 2025-05-07T19:46:54.5791844Z #define _IO_BUFSIZ _G_BUFSIZ 2025-05-07T19:46:54.5792167Z #define _IO_CURRENTLY_PUTTING 0x800 2025-05-07T19:46:54.5792469Z #define _IO_DEC 020 2025-05-07T19:46:54.5792760Z #define _IO_DELETE_DONT_CLOSE 0x40 2025-05-07T19:46:54.5793171Z #define _IO_DONT_CLOSE 0100000 2025-05-07T19:46:54.5793478Z #define _IO_EOF_SEEN 0x10 2025-05-07T19:46:54.5793740Z #define _IO_ERR_SEEN 0x20 2025-05-07T19:46:54.5794032Z #define _IO_FIXED 010000 2025-05-07T19:46:54.5794323Z #define _IO_FLAGS2_MMAP 1 2025-05-07T19:46:54.5794586Z #define _IO_FLAGS2_NOTCANCEL 2 2025-05-07T19:46:54.5795016Z #define _IO_FLAGS2_USER_WBUF 8 2025-05-07T19:46:54.5795327Z #define _IO_HAVE_ST_BLKSIZE _G_HAVE_ST_BLKSIZE 2025-05-07T19:46:54.5795683Z #define _IO_HEX 0100 2025-05-07T19:46:54.5795934Z #define _IO_INTERNAL 010 2025-05-07T19:46:54.5796231Z #define _IO_IN_BACKUP 0x100 2025-05-07T19:46:54.5796514Z #define _IO_IS_APPENDING 0x1000 2025-05-07T19:46:54.5796833Z #define _IO_IS_FILEBUF 0x2000 2025-05-07T19:46:54.5797094Z #define _IO_LEFT 02 2025-05-07T19:46:54.5797364Z #define _IO_LINE_BUF 0x200 2025-05-07T19:46:54.5797623Z #define _IO_LINKED 0x80 2025-05-07T19:46:54.5797908Z #define _IO_MAGIC 0xFBAD0000 2025-05-07T19:46:54.5798210Z #define _IO_MAGIC_MASK 0xFFFF0000 2025-05-07T19:46:54.5798489Z #define _IO_NO_READS 4 2025-05-07T19:46:54.5798764Z #define _IO_NO_WRITES 8 2025-05-07T19:46:54.5799006Z #define _IO_OCT 040 2025-05-07T19:46:54.5799422Z #define _IO_PENDING_OUTPUT_COUNT(_fp) ((_fp)->_IO_write_ptr - (_fp)->_IO_write_base) 2025-05-07T19:46:54.5799871Z #define _IO_RIGHT 04 2025-05-07T19:46:54.5800166Z #define _IO_SCIENTIFIC 04000 2025-05-07T19:46:54.5800442Z #define _IO_SHOWBASE 0200 2025-05-07T19:46:54.5800726Z #define _IO_SHOWPOINT 0400 2025-05-07T19:46:54.5800976Z #define _IO_SHOWPOS 02000 2025-05-07T19:46:54.5801255Z #define _IO_SKIPWS 01 2025-05-07T19:46:54.5801493Z #define _IO_STDIO 040000 2025-05-07T19:46:54.5801713Z #define _IO_STDIO_H 2025-05-07T19:46:54.5801965Z #define _IO_TIED_PUT_GET 0x400 2025-05-07T19:46:54.5802240Z #define _IO_UNBUFFERED 2 2025-05-07T19:46:54.5802530Z #define _IO_UNIFIED_JUMPTABLES 1 2025-05-07T19:46:54.5802818Z #define _IO_UNITBUF 020000 2025-05-07T19:46:54.5803117Z #define _IO_UPPERCASE 01000 2025-05-07T19:46:54.5803383Z #define _IO_USER_BUF 1 2025-05-07T19:46:54.5803662Z #define _IO_USER_LOCK 0x8000 2025-05-07T19:46:54.5803946Z #define _IO_cleanup_region_end(_Doit) 2025-05-07T19:46:54.5804299Z #define _IO_cleanup_region_start(_fct,_fp) 2025-05-07T19:46:54.5804735Z #define _IO_feof_unlocked(__fp) (((__fp)->_flags & _IO_EOF_SEEN) != 0) 2025-05-07T19:46:54.5805349Z #define _IO_ferror_unlocked(__fp) (((__fp)->_flags & _IO_ERR_SEEN) != 0) 2025-05-07T19:46:54.5805794Z #define _IO_file_flags _flags 2025-05-07T19:46:54.5806074Z #define _IO_flockfile(_fp) 2025-05-07T19:46:54.5806374Z #define _IO_fpos64_t _G_fpos64_t 2025-05-07T19:46:54.5806667Z #define _IO_fpos_t _G_fpos_t 2025-05-07T19:46:54.5806964Z #define _IO_ftrylockfile(_fp) 2025-05-07T19:46:54.5807243Z #define _IO_funlockfile(_fp) 2025-05-07T19:46:54.5807813Z #define _IO_getc_unlocked(_fp) (_IO_BE ((_fp)->_IO_read_ptr >= (_fp)->_IO_read_end, 0) ? __uflow (_fp) : *(unsigned char *) (_fp)->_IO_read_ptr++) 2025-05-07T19:46:54.5808401Z #define _IO_iconv_t _G_iconv_t 2025-05-07T19:46:54.5808672Z #define _IO_off64_t __off64_t 2025-05-07T19:46:54.5808967Z #define _IO_off_t __off_t 2025-05-07T19:46:54.5809253Z #define _IO_peekc(_fp) _IO_peekc_unlocked (_fp) 2025-05-07T19:46:54.5809916Z #define _IO_peekc_unlocked(_fp) (_IO_BE ((_fp)->_IO_read_ptr >= (_fp)->_IO_read_end, 0) && __underflow (_fp) == EOF ? EOF : *(unsigned char *) (_fp)->_IO_read_ptr) 2025-05-07T19:46:54.5810525Z #define _IO_pid_t __pid_t 2025-05-07T19:46:54.5811185Z #define _IO_putc_unlocked(_ch,_fp) (_IO_BE ((_fp)->_IO_write_ptr >= (_fp)->_IO_write_end, 0) ? __overflow (_fp, (unsigned char) (_ch)) : (unsigned char) (*(_fp)->_IO_write_ptr++ = (_ch))) 2025-05-07T19:46:54.5811884Z #define _IO_size_t size_t 2025-05-07T19:46:54.5812143Z #define _IO_ssize_t __ssize_t 2025-05-07T19:46:54.5812484Z #define _IO_stderr ((_IO_FILE*)(&_IO_2_1_stderr_)) 2025-05-07T19:46:54.5812846Z #define _IO_stdin ((_IO_FILE*)(&_IO_2_1_stdin_)) 2025-05-07T19:46:54.5813240Z #define _IO_stdout ((_IO_FILE*)(&_IO_2_1_stdout_)) 2025-05-07T19:46:54.5813572Z #define _IO_uid_t __uid_t 2025-05-07T19:46:54.5813863Z #define _IO_va_list __gnuc_va_list 2025-05-07T19:46:54.5814149Z #define _IO_wint_t wint_t 2025-05-07T19:46:54.5814435Z #define _ISOC11_SOURCE 1 2025-05-07T19:46:54.5814688Z #define _ISOC95_SOURCE 1 2025-05-07T19:46:54.5814966Z #define _ISOC99_SOURCE 1 2025-05-07T19:46:54.5815396Z #define _ISbit(bit) ((bit) < 8 ? ((1 << (bit)) << 8) : ((1 << (bit)) >> 8)) 2025-05-07T19:46:54.5815793Z #define _LARGEFILE64_SOURCE 1 2025-05-07T19:46:54.5816110Z #define _LARGEFILE_SOURCE 1 2025-05-07T19:46:54.5816382Z #define _LIBC_LIMITS_H_ 1 2025-05-07T19:46:54.5816678Z #define _LINUX_LIMITS_H 2025-05-07T19:46:54.5816930Z #define _LP64 1 2025-05-07T19:46:54.5817281Z #define _MATH_H 1 2025-05-07T19:46:54.5817692Z #define _MATH_H_MATHDEF 1 2025-05-07T19:46:54.5817997Z #define _MOVE_H 1 2025-05-07T19:46:54.5818315Z #define _Mfloat_ float 2025-05-07T19:46:54.5818621Z #define _Mlong_double_ long double 2025-05-07T19:46:54.5818951Z #define _NEW 2025-05-07T19:46:54.5819202Z #define _OLD_STDIO_MAGIC 0xFABC0000 2025-05-07T19:46:54.5819541Z #define _POSIX2_BC_BASE_MAX 99 2025-05-07T19:46:54.5819840Z #define _POSIX2_BC_DIM_MAX 2048 2025-05-07T19:46:54.5820164Z #define _POSIX2_BC_SCALE_MAX 99 2025-05-07T19:46:54.5820462Z #define _POSIX2_BC_STRING_MAX 1000 2025-05-07T19:46:54.5820805Z #define _POSIX2_CHARCLASS_NAME_MAX 14 2025-05-07T19:46:54.5821136Z #define _POSIX2_COLL_WEIGHTS_MAX 2 2025-05-07T19:46:54.5821476Z #define _POSIX2_EXPR_NEST_MAX 32 2025-05-07T19:46:54.5821772Z #define _POSIX2_LINE_MAX 2048 2025-05-07T19:46:54.5822089Z #define _POSIX2_RE_DUP_MAX 255 2025-05-07T19:46:54.5822412Z #define _POSIX_AIO_LISTIO_MAX 2 2025-05-07T19:46:54.5822703Z #define _POSIX_AIO_MAX 1 2025-05-07T19:46:54.5823001Z #define _POSIX_ARG_MAX 4096 2025-05-07T19:46:54.5823281Z #define _POSIX_CHILD_MAX 25 2025-05-07T19:46:54.5823599Z #define _POSIX_CLOCKRES_MIN 20000000 2025-05-07T19:46:54.5823919Z #define _POSIX_C_SOURCE 200809L 2025-05-07T19:46:54.5824238Z #define _POSIX_DELAYTIMER_MAX 32 2025-05-07T19:46:54.5824554Z #define _POSIX_FD_SETSIZE _POSIX_OPEN_MAX 2025-05-07T19:46:54.5824923Z #define _POSIX_HIWAT _POSIX_PIPE_BUF 2025-05-07T19:46:54.5825238Z #define _POSIX_HOST_NAME_MAX 255 2025-05-07T19:46:54.5825558Z #define _POSIX_LINK_MAX 8 2025-05-07T19:46:54.5825863Z #define _POSIX_LOGIN_NAME_MAX 9 2025-05-07T19:46:54.5826163Z #define _POSIX_MAX_CANON 255 2025-05-07T19:46:54.5826553Z #define _POSIX_MAX_INPUT 255 2025-05-07T19:46:54.5826846Z #define _POSIX_MQ_OPEN_MAX 8 2025-05-07T19:46:54.5827165Z #define _POSIX_MQ_PRIO_MAX 32 2025-05-07T19:46:54.5827457Z #define _POSIX_NAME_MAX 14 2025-05-07T19:46:54.5827775Z #define _POSIX_NGROUPS_MAX 8 2025-05-07T19:46:54.5828066Z #define _POSIX_OPEN_MAX 20 2025-05-07T19:46:54.5828378Z #define _POSIX_PATH_MAX 256 2025-05-07T19:46:54.5828663Z #define _POSIX_PIPE_BUF 512 2025-05-07T19:46:54.5828975Z #define _POSIX_QLIMIT 1 2025-05-07T19:46:54.5829280Z #define _POSIX_RE_DUP_MAX 255 2025-05-07T19:46:54.5829578Z #define _POSIX_RTSIG_MAX 8 2025-05-07T19:46:54.5829993Z #define _POSIX_SEM_NSEMS_MAX 256 2025-05-07T19:46:54.5830283Z #define _POSIX_SEM_VALUE_MAX 32767 2025-05-07T19:46:54.5830608Z #define _POSIX_SIGQUEUE_MAX 32 2025-05-07T19:46:54.5830880Z #define _POSIX_SOURCE 1 2025-05-07T19:46:54.5831170Z #define _POSIX_SSIZE_MAX 32767 2025-05-07T19:46:54.5831451Z #define _POSIX_STREAM_MAX 8 2025-05-07T19:46:54.5831752Z #define _POSIX_SYMLINK_MAX 255 2025-05-07T19:46:54.5832030Z #define _POSIX_SYMLOOP_MAX 8 2025-05-07T19:46:54.5832360Z #define _POSIX_THREAD_DESTRUCTOR_ITERATIONS 4 2025-05-07T19:46:54.5832717Z #define _POSIX_THREAD_KEYS_MAX 128 2025-05-07T19:46:54.5833009Z #define _POSIX_THREAD_THREADS_MAX 64 2025-05-07T19:46:54.5833331Z #define _POSIX_TIMER_MAX 32 2025-05-07T19:46:54.5833607Z #define _POSIX_TTY_NAME_MAX 9 2025-05-07T19:46:54.5833919Z #define _POSIX_TZNAME_MAX 6 2025-05-07T19:46:54.5834192Z #define _POSIX_UIO_MAXIOV 16 2025-05-07T19:46:54.5834559Z #define _PSTL_ASSERT(_Condition) __glibcxx_assert(_Condition) 2025-05-07T19:46:54.5835054Z #define _PSTL_ASSERT_MSG(_Condition,_Message) __glibcxx_assert(_Condition) 2025-05-07T19:46:54.5835687Z #define _PSTL_CLANG_VERSION (__clang_major__ * 10000 + __clang_minor__ * 100 + __clang_patchlevel__) 2025-05-07T19:46:54.5836200Z #define _PSTL_CONFIG_H 2025-05-07T19:46:54.5836663Z #define _PSTL_CPP11_STD_ROTATE_BROKEN ((__GLIBCXX__ && __GLIBCXX__ < 20150716) || (_MSC_VER && _MSC_VER < 1800)) 2025-05-07T19:46:54.5837596Z #define _PSTL_CPP14_2RANGE_MISMATCH_EQUAL_PRESENT (_MSC_VER >= 1900 || __cplusplus >= 201300L || __cpp_lib_robust_nonmodifying_seq_ops == 201304) 2025-05-07T19:46:54.5838387Z #define _PSTL_CPP14_INTEGER_SEQUENCE_PRESENT (_MSC_VER >= 1900 || __cplusplus >= 201402L) 2025-05-07T19:46:54.5839197Z #define _PSTL_CPP14_MAKE_REVERSE_ITERATOR_PRESENT (_MSC_VER >= 1900 || __cplusplus >= 201402L || __cpp_lib_make_reverse_iterator == 201402) 2025-05-07T19:46:54.5840187Z #define _PSTL_CPP14_VARIABLE_TEMPLATES_PRESENT (!__INTEL_COMPILER || __INTEL_COMPILER >= 1700) && (_MSC_FULL_VER >= 190023918 || __cplusplus >= 201402L) 2025-05-07T19:46:54.5840926Z #define _PSTL_CPP17_EXECUTION_POLICIES_PRESENT (_MSC_VER >= 1912) 2025-05-07T19:46:54.5841407Z #define _PSTL_EARLYEXIT_PRESENT (__INTEL_COMPILER >= 1800) 2025-05-07T19:46:54.5841903Z #define _PSTL_GCC_VERSION (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__) 2025-05-07T19:46:54.5842378Z #define _PSTL_HIDE_FROM_ABI_POP 2025-05-07T19:46:54.5842674Z #define _PSTL_HIDE_FROM_ABI_PUSH 2025-05-07T19:46:54.5843058Z #define _PSTL_ICC_18_OMP_SIMD_BROKEN (__INTEL_COMPILER == 1800) 2025-05-07T19:46:54.5843519Z #define _PSTL_MONOTONIC_PRESENT (__INTEL_COMPILER >= 1800) 2025-05-07T19:46:54.5843890Z #define _PSTL_PAR_BACKEND_SERIAL 2025-05-07T19:46:54.5844211Z #define _PSTL_PRAGMA(x) _Pragma(# x) 2025-05-07T19:46:54.5844857Z #define _PSTL_PRAGMA_DECLARE_REDUCTION(NAME,OP) _PSTL_PRAGMA(omp declare reduction(NAME:OP : omp_out(omp_in)) initializer(omp_priv = omp_orig)) 2025-05-07T19:46:54.5845612Z #define _PSTL_PRAGMA_DECLARE_SIMD _PSTL_PRAGMA(omp declare simd) 2025-05-07T19:46:54.5846007Z #define _PSTL_PRAGMA_FORCEINLINE 2025-05-07T19:46:54.5846384Z #define _PSTL_PRAGMA_LOCATION " [Parallel STL message]: " 2025-05-07T19:46:54.5846752Z #define _PSTL_PRAGMA_MESSAGE(x) 2025-05-07T19:46:54.5847280Z #define _PSTL_PRAGMA_MESSAGE_IMPL(x) _PSTL_PRAGMA(message(_PSTL_STRING_CONCAT(_PSTL_PRAGMA_LOCATION, x))) 2025-05-07T19:46:54.5847923Z #define _PSTL_PRAGMA_MESSAGE_POLICIES(x) 2025-05-07T19:46:54.5848280Z #define _PSTL_PRAGMA_SIMD _PSTL_PRAGMA(omp simd) 2025-05-07T19:46:54.5848646Z #define _PSTL_PRAGMA_SIMD_EARLYEXIT 2025-05-07T19:46:54.5848972Z #define _PSTL_PRAGMA_SIMD_EXCLUSIVE_SCAN(PRM) 2025-05-07T19:46:54.5849355Z #define _PSTL_PRAGMA_SIMD_INCLUSIVE_SCAN(PRM) 2025-05-07T19:46:54.5849718Z #define _PSTL_PRAGMA_SIMD_ORDERED_MONOTONIC(PRM) 2025-05-07T19:46:54.5850171Z #define _PSTL_PRAGMA_SIMD_ORDERED_MONOTONIC_2ARGS(PRM1,PRM2) 2025-05-07T19:46:54.5850724Z #define _PSTL_PRAGMA_SIMD_REDUCTION(PRM) _PSTL_PRAGMA(omp simd reduction(PRM)) 2025-05-07T19:46:54.5851173Z #define _PSTL_PRAGMA_SIMD_SCAN(PRM) 2025-05-07T19:46:54.5851522Z #define _PSTL_PRAGMA_VECTOR_UNALIGNED 2025-05-07T19:46:54.5851847Z #define _PSTL_STRING(x) _PSTL_STRING_AUX(x) 2025-05-07T19:46:54.5852190Z #define _PSTL_STRING_AUX(x) #x 2025-05-07T19:46:54.5852478Z #define _PSTL_STRING_CONCAT(x,y) x #y 2025-05-07T19:46:54.5852807Z #define _PSTL_UDR_PRESENT 0 2025-05-07T19:46:54.5853264Z #define _PSTL_UDS_PRESENT (__INTEL_COMPILER >= 1900 && __INTEL_COMPILER_BUILD_DATE >= 20180626) 2025-05-07T19:46:54.5853787Z #define _PSTL_USAGE_WARNINGS 0 2025-05-07T19:46:54.5854129Z #define _PSTL_USE_NONTEMPORAL_STORES_IF_ALLOWED 2025-05-07T19:46:54.5854465Z #define _PSTL_VERSION 12000 2025-05-07T19:46:54.5854801Z #define _PSTL_VERSION_MAJOR (_PSTL_VERSION / 1000) 2025-05-07T19:46:54.5855192Z #define _PSTL_VERSION_MINOR ((_PSTL_VERSION % 1000) / 10) 2025-05-07T19:46:54.5855610Z #define _PSTL_VERSION_PATCH (_PSTL_VERSION % 10) 2025-05-07T19:46:54.5855936Z #define _PTRDIFF_T 2025-05-07T19:46:54.5856201Z #define _PTR_TRAITS_H 1 2025-05-07T19:46:54.5856452Z #define _SIGSET_H_types 1 2025-05-07T19:46:54.5856814Z #define _SIGSET_NWORDS (1024 / (8 * sizeof (unsigned long int))) 2025-05-07T19:46:54.5857298Z #define _SIZE_T 2025-05-07T19:46:54.5857700Z #define _STDC_PREDEF_H 1 2025-05-07T19:46:54.5858001Z #define _STDIO_H 1 2025-05-07T19:46:54.5858258Z #define _STDIO_USES_IOSTREAM 2025-05-07T19:46:54.5858646Z #define _STDLIB_H 1 2025-05-07T19:46:54.5858909Z #define _STL_ALGOBASE_H 1 2025-05-07T19:46:54.5859228Z #define _STL_ITERATOR_BASE_FUNCS_H 1 2025-05-07T19:46:54.5859552Z #define _STL_ITERATOR_BASE_TYPES_H 1 2025-05-07T19:46:54.5859899Z #define _STL_ITERATOR_H 1 2025-05-07T19:46:54.5860165Z #define _STL_PAIR_H 1 2025-05-07T19:46:54.5860448Z #define _STL_RELOPS_H 1 2025-05-07T19:46:54.5860703Z #define _STRING_H 1 2025-05-07T19:46:54.5860982Z #define _STRUCT_TIMEVAL 1 2025-05-07T19:46:54.5861276Z #define _SVID_SOURCE 1 2025-05-07T19:46:54.5861529Z #define _SYS_CDEFS_H 1 2025-05-07T19:46:54.5861823Z #define _SYS_SELECT_H 1 2025-05-07T19:46:54.5862089Z #define _SYS_SYSMACROS_H 1 2025-05-07T19:46:54.5862391Z #define _SYS_TYPES_H 1 2025-05-07T19:46:54.5862647Z #define _TIME_H 1 2025-05-07T19:46:54.5862924Z #define _VA_LIST_DEFINED 2025-05-07T19:46:54.5863207Z #define _XLOCALE_H 1 2025-05-07T19:46:54.5863484Z #define _XOPEN_IOV_MAX _POSIX_UIO_MAXIOV 2025-05-07T19:46:54.5863842Z #define _XOPEN_LIM_H 1 2025-05-07T19:46:54.5864111Z #define _XOPEN_SOURCE 700 2025-05-07T19:46:54.5864419Z #define _XOPEN_SOURCE_EXTENDED 1 2025-05-07T19:46:54.5864813Z #define __ASMNAME(cname) __ASMNAME2 (__USER_LABEL_PREFIX__, cname) 2025-05-07T19:46:54.5865329Z #define __ASMNAME2(prefix,cname) __STRING (prefix) cname 2025-05-07T19:46:54.5865743Z #define __ASSERT_FUNCTION __PRETTY_FUNCTION__ 2025-05-07T19:46:54.5866137Z #define __ASSERT_VOID_CAST static_cast 2025-05-07T19:46:54.5866508Z #define __ATOMIC_ACQUIRE 2 2025-05-07T19:46:54.5866786Z #define __ATOMIC_ACQ_REL 4 2025-05-07T19:46:54.5867086Z #define __ATOMIC_CONSUME 1 2025-05-07T19:46:54.5867563Z #define __ATOMIC_RELAXED 0 2025-05-07T19:46:54.5867875Z #define __ATOMIC_RELEASE 3 2025-05-07T19:46:54.5868157Z #define __ATOMIC_SEQ_CST 5 2025-05-07T19:46:54.5868477Z #define __BEGIN_DECLS extern "C" { 2025-05-07T19:46:54.5868790Z #define __BEGIN_NAMESPACE_C99 2025-05-07T19:46:54.5869120Z #define __BEGIN_NAMESPACE_STD 2025-05-07T19:46:54.5869423Z #define __BIGGEST_ALIGNMENT__ 16 2025-05-07T19:46:54.5870974Z #define __BIG_ENDIAN 4321 2025-05-07T19:46:54.5871276Z #define __BITINT_MAXWIDTH__ 8388608 2025-05-07T19:46:54.5871627Z #define __BIT_TYPES_DEFINED__ 1 2025-05-07T19:46:54.5871969Z #define __BLKCNT64_T_TYPE __SQUAD_TYPE 2025-05-07T19:46:54.5872323Z #define __BLKCNT_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:54.5872722Z #define __BLKSIZE_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:54.5873066Z #define __BOOL_WIDTH__ 8 2025-05-07T19:46:54.5873388Z #define __BYTE_ORDER __LITTLE_ENDIAN 2025-05-07T19:46:54.5873737Z #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__ 2025-05-07T19:46:54.5874118Z #define __CHANNEL_DESCRIPTOR_H__ 2025-05-07T19:46:54.5874432Z #define __CHAR16_TYPE__ unsigned short 2025-05-07T19:46:54.5874790Z #define __CHAR32_TYPE__ unsigned int 2025-05-07T19:46:54.5875119Z #define __CHAR_BIT__ 8 2025-05-07T19:46:54.5875386Z #define __CLANG_ATOMIC_BOOL_LOCK_FREE 2 2025-05-07T19:46:54.5875748Z #define __CLANG_ATOMIC_CHAR16_T_LOCK_FREE 2 2025-05-07T19:46:54.5876103Z #define __CLANG_ATOMIC_CHAR32_T_LOCK_FREE 2 2025-05-07T19:46:54.5876483Z #define __CLANG_ATOMIC_CHAR_LOCK_FREE 2 2025-05-07T19:46:54.5876815Z #define __CLANG_ATOMIC_INT_LOCK_FREE 2 2025-05-07T19:46:54.5877164Z #define __CLANG_ATOMIC_LLONG_LOCK_FREE 2 2025-05-07T19:46:54.5877495Z #define __CLANG_ATOMIC_LONG_LOCK_FREE 2 2025-05-07T19:46:54.5877864Z #define __CLANG_ATOMIC_POINTER_LOCK_FREE 2 2025-05-07T19:46:54.5878240Z #define __CLANG_ATOMIC_SHORT_LOCK_FREE 2 2025-05-07T19:46:54.5878585Z #define __CLANG_ATOMIC_WCHAR_T_LOCK_FREE 2 2025-05-07T19:46:54.5878946Z #define __CLANG_LIMITS_H 2025-05-07T19:46:54.5879231Z #define __CLANG_MAX_ALIGN_T_DEFINED 2025-05-07T19:46:54.5879691Z #define __CLOCKID_T_TYPE __S32_TYPE 2025-05-07T19:46:54.5880116Z #define __CLOCK_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:54.5880467Z #define __COMMON_FUNCTIONS_H__ 2025-05-07T19:46:54.5880734Z #define __COMPAR_FN_T 2025-05-07T19:46:54.5881017Z #define __CONCAT(x,y) x ## y 2025-05-07T19:46:54.5881292Z #define __CONSTANT_CFSTRINGS__ 1 2025-05-07T19:46:54.5881736Z #define __CUDACC_DEVICE_ATOMIC_BUILTINS__ 1 2025-05-07T19:46:54.5882087Z #define __CUDACC_VER_BUILD__ 61 2025-05-07T19:46:54.5882365Z #define __CUDACC_VER_MAJOR__ 12 2025-05-07T19:46:54.5882665Z #define __CUDACC_VER_MINOR__ 8 2025-05-07T19:46:54.5883273Z #define __CUDACC_VER__ "__CUDACC_VER__ is no longer supported. Use __CUDACC_VER_MAJOR__, __CUDACC_VER_MINOR__, and __CUDACC_VER_BUILD__ instead." 2025-05-07T19:46:54.5883929Z #define __CUDACC__ 1 2025-05-07T19:46:54.5884184Z #define __CUDART_API_PTDS(api) api 2025-05-07T19:46:54.5884510Z #define __CUDART_API_PTSZ(api) api 2025-05-07T19:46:54.5884964Z #define __CUDART_API_VERSION ((__CUDA_API_VER_MAJOR__ * 1000) + (__CUDA_API_VER_MINOR__ * 10)) 2025-05-07T19:46:54.5885468Z #define __CUDA_API_VER_MAJOR__ 12 2025-05-07T19:46:54.5885784Z #define __CUDA_API_VER_MINOR__ 8 2025-05-07T19:46:54.5886142Z #define __CUDA_ARCH_HAS_FEATURE__(_FEAT) __CUDA_ARCH_FEAT_##_FEAT 2025-05-07T19:46:54.5886552Z #define __CUDA_ARCH_LIST__ 520 2025-05-07T19:46:54.5886828Z #define __CUDA_ARCH__ 520 2025-05-07T19:46:54.5887124Z #define __CUDA_DEVICE_RUNTIME_API_H__ 2025-05-07T19:46:54.5887419Z #define __CUDA_MATH_CRTIMP 2025-05-07T19:46:54.5887708Z #define __CUDA_RUNTIME_API_H__ 2025-05-07T19:46:54.5887987Z #define __CUDA_RUNTIME_H__ 2025-05-07T19:46:54.5888276Z #define __DADDR_T_TYPE __S32_TYPE 2025-05-07T19:46:54.5888521Z #define __DBL_DECIMAL_DIG__ 17 2025-05-07T19:46:54.5888788Z #define __DBL_DENORM_MIN__ 4.9406564584124654e-324 2025-05-07T19:46:54.5889096Z #define __DBL_DIG__ 15 2025-05-07T19:46:54.5889327Z #define __DBL_EPSILON__ 2.2204460492503131e-16 2025-05-07T19:46:54.5889658Z #define __DBL_HAS_DENORM__ 1 2025-05-07T19:46:54.5889923Z #define __DBL_HAS_INFINITY__ 1 2025-05-07T19:46:54.5890212Z #define __DBL_HAS_QUIET_NAN__ 1 2025-05-07T19:46:54.5890482Z #define __DBL_MANT_DIG__ 53 2025-05-07T19:46:54.5890763Z #define __DBL_MAX_10_EXP__ 308 2025-05-07T19:46:54.5891030Z #define __DBL_MAX_EXP__ 1024 2025-05-07T19:46:54.5891323Z #define __DBL_MAX__ 1.7976931348623157e+308 2025-05-07T19:46:54.5891733Z #define __DBL_MIN_10_EXP__ (-307) 2025-05-07T19:46:54.5892015Z #define __DBL_MIN_EXP__ (-1021) 2025-05-07T19:46:54.5892317Z #define __DBL_MIN__ 2.2250738585072014e-308 2025-05-07T19:46:54.5892631Z #define __DECIMAL_DIG__ __LDBL_DECIMAL_DIG__ 2025-05-07T19:46:54.5892965Z #define __DELETE_THROW throw() 2025-05-07T19:46:54.5893227Z #define __DEPRECATED 1 2025-05-07T19:46:54.5893514Z #define __DEVICE_ATOMIC_FUNCTIONS_HPP__ 2025-05-07T19:46:54.5893828Z #define __DEVICE_ATOMIC_FUNCTIONS_H__ 2025-05-07T19:46:54.5894158Z #define __DEVICE_DOUBLE_FUNCTIONS_HPP__ 2025-05-07T19:46:54.5894471Z #define __DEVICE_DOUBLE_FUNCTIONS_H__ 2025-05-07T19:46:54.5894792Z #define __DEVICE_FUNCTIONS_HPP__ 2025-05-07T19:46:54.5895095Z #define __DEVICE_FUNCTIONS_H__ 2025-05-07T19:46:54.5895378Z #define __DEVICE_LAUNCH_PARAMETERS_H__ 2025-05-07T19:46:54.5895697Z #define __DEVICE_TYPES_H__ 2025-05-07T19:46:54.5895964Z #define __DEV_T_TYPE __UQUAD_TYPE 2025-05-07T19:46:54.5896273Z #define __DRIVER_FUNCTIONS_H__ 2025-05-07T19:46:54.5896550Z #define __DRIVER_TYPES_H__ 2025-05-07T19:46:54.5896819Z #define __ELF__ 1 2025-05-07T19:46:54.5897039Z #define __END_DECLS } 2025-05-07T19:46:54.5897353Z #define __END_NAMESPACE_C99 2025-05-07T19:46:54.5897765Z #define __END_NAMESPACE_STD 2025-05-07T19:46:54.5898039Z #define __EXCEPTIONS 1 2025-05-07T19:46:54.5898393Z #define __EXCEPTION_H 1 2025-05-07T19:46:54.5898658Z #define __FDS_BITS(set) ((set)->fds_bits) 2025-05-07T19:46:54.5899121Z #define __FD_CLR(d,set) ((void) (__FDS_BITS (set)[__FD_ELT (d)] &= ~__FD_MASK (d))) 2025-05-07T19:46:54.5899576Z #define __FD_ELT(d) ((d) / __NFDBITS) 2025-05-07T19:46:54.5900020Z #define __FD_ISSET(d,set) ((__FDS_BITS (set)[__FD_ELT (d)] & __FD_MASK (d)) != 0) 2025-05-07T19:46:54.5900508Z #define __FD_MASK(d) ((__fd_mask) 1 << ((d) % __NFDBITS)) 2025-05-07T19:46:54.5901014Z #define __FD_SET(d,set) ((void) (__FDS_BITS (set)[__FD_ELT (d)] |= __FD_MASK (d))) 2025-05-07T19:46:54.5901449Z #define __FD_SETSIZE 1024 2025-05-07T19:46:54.5902267Z #define __FD_ZERO(fdsp) do { int __d0, __d1; __asm__ __volatile__ ("cld; rep; " __FD_ZERO_STOS : "=c" (__d0), "=D" (__d1) : "a" (0), "0" (sizeof (fd_set) / sizeof (__fd_mask)), "1" (&__FDS_BITS (fdsp)[0]) : "memory"); } while (0) 2025-05-07T19:46:54.5903079Z #define __FD_ZERO_STOS "stosq" 2025-05-07T19:46:54.5903362Z #define __FILE_defined 1 2025-05-07T19:46:54.5903626Z #define __FINITE_MATH_ONLY__ 0 2025-05-07T19:46:54.5903879Z #define __FLOAT128__ 1 2025-05-07T19:46:54.5904141Z #define __FLOAT_WORD_ORDER __BYTE_ORDER 2025-05-07T19:46:54.5904447Z #define __FLT16_DECIMAL_DIG__ 5 2025-05-07T19:46:54.5904755Z #define __FLT16_DENORM_MIN__ 5.9604644775390625e-8F16 2025-05-07T19:46:54.5905091Z #define __FLT16_DIG__ 3 2025-05-07T19:46:54.5905343Z #define __FLT16_EPSILON__ 9.765625e-4F16 2025-05-07T19:46:54.5905651Z #define __FLT16_HAS_DENORM__ 1 2025-05-07T19:46:54.5905921Z #define __FLT16_HAS_INFINITY__ 1 2025-05-07T19:46:54.5906217Z #define __FLT16_HAS_QUIET_NAN__ 1 2025-05-07T19:46:54.5906490Z #define __FLT16_MANT_DIG__ 11 2025-05-07T19:46:54.5906757Z #define __FLT16_MAX_10_EXP__ 4 2025-05-07T19:46:54.5907010Z #define __FLT16_MAX_EXP__ 16 2025-05-07T19:46:54.5907289Z #define __FLT16_MAX__ 6.5504e+4F16 2025-05-07T19:46:54.5907578Z #define __FLT16_MIN_10_EXP__ (-4) 2025-05-07T19:46:54.5907850Z #define __FLT16_MIN_EXP__ (-13) 2025-05-07T19:46:54.5908123Z #define __FLT16_MIN__ 6.103515625e-5F16 2025-05-07T19:46:54.5908408Z #define __FLT_DECIMAL_DIG__ 9 2025-05-07T19:46:54.5908690Z #define __FLT_DENORM_MIN__ 1.40129846e-45F 2025-05-07T19:46:54.5908979Z #define __FLT_DIG__ 6 2025-05-07T19:46:54.5909229Z #define __FLT_EPSILON__ 1.19209290e-7F 2025-05-07T19:46:54.5909513Z #define __FLT_HAS_DENORM__ 1 2025-05-07T19:46:54.5909893Z #define __FLT_HAS_INFINITY__ 1 2025-05-07T19:46:54.5910134Z #define __FLT_HAS_QUIET_NAN__ 1 2025-05-07T19:46:54.5910386Z #define __FLT_MANT_DIG__ 24 2025-05-07T19:46:54.5910626Z #define __FLT_MAX_10_EXP__ 38 2025-05-07T19:46:54.5910865Z #define __FLT_MAX_EXP__ 128 2025-05-07T19:46:54.5911165Z #define __FLT_MAX__ 3.40282347e+38F 2025-05-07T19:46:54.5911418Z #define __FLT_MIN_10_EXP__ (-37) 2025-05-07T19:46:54.5911676Z #define __FLT_MIN_EXP__ (-125) 2025-05-07T19:46:54.5911915Z #define __FLT_MIN__ 1.17549435e-38F 2025-05-07T19:46:54.5912170Z #define __FLT_RADIX__ 2 2025-05-07T19:46:54.5912399Z #define __FSBLKCNT64_T_TYPE __UQUAD_TYPE 2025-05-07T19:46:54.5912715Z #define __FSBLKCNT_T_TYPE __SYSCALL_ULONG_TYPE 2025-05-07T19:46:54.5913018Z #define __FSFILCNT64_T_TYPE __UQUAD_TYPE 2025-05-07T19:46:54.5913326Z #define __FSFILCNT_T_TYPE __SYSCALL_ULONG_TYPE 2025-05-07T19:46:54.5913645Z #define __FSID_T_TYPE struct { int __val[2]; } 2025-05-07T19:46:54.5913948Z #define __FSWORD_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:54.5914230Z #define __FXSR__ 1 2025-05-07T19:46:54.5914441Z #define __GCC_ASM_FLAG_OUTPUTS__ 1 2025-05-07T19:46:54.5914718Z #define __GCC_ATOMIC_BOOL_LOCK_FREE 2 2025-05-07T19:46:54.5914992Z #define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 2 2025-05-07T19:46:54.5915286Z #define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2 2025-05-07T19:46:54.5915568Z #define __GCC_ATOMIC_CHAR_LOCK_FREE 2 2025-05-07T19:46:54.5915846Z #define __GCC_ATOMIC_INT_LOCK_FREE 2 2025-05-07T19:46:54.5916122Z #define __GCC_ATOMIC_LLONG_LOCK_FREE 2 2025-05-07T19:46:54.5916392Z #define __GCC_ATOMIC_LONG_LOCK_FREE 2 2025-05-07T19:46:54.5916674Z #define __GCC_ATOMIC_POINTER_LOCK_FREE 2 2025-05-07T19:46:54.5916957Z #define __GCC_ATOMIC_SHORT_LOCK_FREE 2 2025-05-07T19:46:54.5917253Z #define __GCC_ATOMIC_TEST_AND_SET_TRUEVAL 1 2025-05-07T19:46:54.5917539Z #define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2 2025-05-07T19:46:54.5917829Z #define __GCC_HAVE_DWARF2_CFI_ASM 1 2025-05-07T19:46:54.5918102Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1 2025-05-07T19:46:54.5918416Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1 2025-05-07T19:46:54.5918711Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1 2025-05-07T19:46:54.5919020Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1 2025-05-07T19:46:54.5919315Z #define __GID_T_TYPE __U32_TYPE 2025-05-07T19:46:54.5919633Z #define __GLIBCXX_BITSIZE_INT_N_0 128 2025-05-07T19:46:54.5919919Z #define __GLIBCXX_TYPE_INT_N_0 __int128 2025-05-07T19:46:54.5920182Z #define __GLIBCXX__ 20230528 2025-05-07T19:46:54.5920435Z #define __GLIBC_HAVE_LONG_LONG 1 2025-05-07T19:46:54.5920679Z #define __GLIBC_MINOR__ 17 2025-05-07T19:46:54.5921073Z #define __GLIBC_PREREQ(maj,min) ((__GLIBC__ << 16) + __GLIBC_MINOR__ >= ((maj) << 16) + (min)) 2025-05-07T19:46:54.5921481Z #define __GLIBC__ 2 2025-05-07T19:46:54.5921694Z #define __GNUC_GNU_INLINE__ 1 2025-05-07T19:46:54.5921936Z #define __GNUC_MINOR__ 2 2025-05-07T19:46:54.5922156Z #define __GNUC_PATCHLEVEL__ 1 2025-05-07T19:46:54.5922533Z #define __GNUC_PREREQ(maj,min) ((__GNUC__ << 16) + __GNUC_MINOR__ >= ((maj) << 16) + (min)) 2025-05-07T19:46:54.5922931Z #define __GNUC_VA_LIST 2025-05-07T19:46:54.5923149Z #define __GNUC__ 4 2025-05-07T19:46:54.5923341Z #define __GNUG__ 4 2025-05-07T19:46:54.5923552Z #define __GNU_LIBRARY__ 6 2025-05-07T19:46:54.5923781Z #define __GXX_ABI_VERSION 1002 2025-05-07T19:46:54.5924222Z #define __GXX_EXPERIMENTAL_CXX0X__ 1 2025-05-07T19:46:54.5924490Z #define __GXX_RTTI 1 2025-05-07T19:46:54.5924713Z #define __GXX_WEAK__ 1 2025-05-07T19:46:54.5924937Z #define __HAVE_COLUMN 2025-05-07T19:46:54.5925156Z #define __HOST_CONFIG_H__ 2025-05-07T19:46:54.5925404Z #define __HOST_DEFINES_H__ 2025-05-07T19:46:54.5925647Z #define __ID_T_TYPE __U32_TYPE 2025-05-07T19:46:54.5925910Z #define __INO64_T_TYPE __UQUAD_TYPE 2025-05-07T19:46:54.5926177Z #define __INO_T_MATCHES_INO64_T 1 2025-05-07T19:46:54.5926471Z #define __INO_T_TYPE __SYSCALL_ULONG_TYPE 2025-05-07T19:46:54.5926761Z #define __INT16_C_SUFFIX__ 2025-05-07T19:46:54.5927007Z #define __INT16_FMTd__ "hd" 2025-05-07T19:46:54.5927243Z #define __INT16_FMTi__ "hi" 2025-05-07T19:46:54.5927486Z #define __INT16_MAX__ 32767 2025-05-07T19:46:54.5927731Z #define __INT16_TYPE__ short 2025-05-07T19:46:54.5927972Z #define __INT32_C_SUFFIX__ 2025-05-07T19:46:54.5928219Z #define __INT32_FMTd__ "d" 2025-05-07T19:46:54.5928451Z #define __INT32_FMTi__ "i" 2025-05-07T19:46:54.5928755Z #define __INT32_MAX__ 2147483647 2025-05-07T19:46:54.5929004Z #define __INT32_TYPE__ int 2025-05-07T19:46:54.5929248Z #define __INT64_C_SUFFIX__ L 2025-05-07T19:46:54.5929487Z #define __INT64_FMTd__ "ld" 2025-05-07T19:46:54.5929761Z #define __INT64_FMTi__ "li" 2025-05-07T19:46:54.5930020Z #define __INT64_MAX__ 9223372036854775807L 2025-05-07T19:46:54.5930346Z #define __INT64_TYPE__ long int 2025-05-07T19:46:54.5930635Z #define __INT8_C_SUFFIX__ 2025-05-07T19:46:54.5930879Z #define __INT8_FMTd__ "hhd" 2025-05-07T19:46:54.5931142Z #define __INT8_FMTi__ "hhi" 2025-05-07T19:46:54.5931370Z #define __INT8_MAX__ 127 2025-05-07T19:46:54.5931624Z #define __INT8_TYPE__ signed char 2025-05-07T19:46:54.5931888Z #define __INTMAX_C_SUFFIX__ L 2025-05-07T19:46:54.5932154Z #define __INTMAX_FMTd__ "ld" 2025-05-07T19:46:54.5932406Z #define __INTMAX_FMTi__ "li" 2025-05-07T19:46:54.5932675Z #define __INTMAX_MAX__ 9223372036854775807L 2025-05-07T19:46:54.5932967Z #define __INTMAX_TYPE__ long int 2025-05-07T19:46:54.5933246Z #define __INTMAX_WIDTH__ 64 2025-05-07T19:46:54.5933493Z #define __INTPTR_FMTd__ "ld" 2025-05-07T19:46:54.5933750Z #define __INTPTR_FMTi__ "li" 2025-05-07T19:46:54.5934021Z #define __INTPTR_MAX__ 9223372036854775807L 2025-05-07T19:46:54.5934309Z #define __INTPTR_TYPE__ long int 2025-05-07T19:46:54.5934581Z #define __INTPTR_WIDTH__ 64 2025-05-07T19:46:54.5934829Z #define __INT_FAST16_FMTd__ "hd" 2025-05-07T19:46:54.5934935Z #define __INT_FAST16_FMTi__ "hi" 2025-05-07T19:46:54.5935023Z #define __INT_FAST16_MAX__ 32767 2025-05-07T19:46:54.5935127Z #define __INT_FAST16_TYPE__ short 2025-05-07T19:46:54.5935232Z #define __INT_FAST16_WIDTH__ 16 2025-05-07T19:46:54.5935355Z #define __INT_FAST32_FMTd__ "d" 2025-05-07T19:46:54.5935449Z #define __INT_FAST32_FMTi__ "i" 2025-05-07T19:46:54.5935550Z #define __INT_FAST32_MAX__ 2147483647 2025-05-07T19:46:54.5935666Z #define __INT_FAST32_TYPE__ int 2025-05-07T19:46:54.5935754Z #define __INT_FAST32_WIDTH__ 32 2025-05-07T19:46:54.5935908Z #define __INT_FAST64_FMTd__ "ld" 2025-05-07T19:46:54.5936004Z #define __INT_FAST64_FMTi__ "li" 2025-05-07T19:46:54.5936145Z #define __INT_FAST64_MAX__ 9223372036854775807L 2025-05-07T19:46:54.5936248Z #define __INT_FAST64_TYPE__ long int 2025-05-07T19:46:54.5936346Z #define __INT_FAST64_WIDTH__ 64 2025-05-07T19:46:54.5936471Z #define __INT_FAST8_FMTd__ "hhd" 2025-05-07T19:46:54.5936570Z #define __INT_FAST8_FMTi__ "hhi" 2025-05-07T19:46:54.5936667Z #define __INT_FAST8_MAX__ 127 2025-05-07T19:46:54.5936773Z #define __INT_FAST8_TYPE__ signed char 2025-05-07T19:46:54.5936899Z #define __INT_FAST8_WIDTH__ 8 2025-05-07T19:46:54.5936995Z #define __INT_LEAST16_FMTd__ "hd" 2025-05-07T19:46:54.5937153Z #define __INT_LEAST16_FMTi__ "hi" 2025-05-07T19:46:54.5937289Z #define __INT_LEAST16_MAX__ 32767 2025-05-07T19:46:54.5937398Z #define __INT_LEAST16_TYPE__ short 2025-05-07T19:46:54.5937492Z #define __INT_LEAST16_WIDTH__ 16 2025-05-07T19:46:54.5937755Z #define __INT_LEAST32_FMTd__ "d" 2025-05-07T19:46:54.5937889Z #define __INT_LEAST32_FMTi__ "i" 2025-05-07T19:46:54.5937996Z #define __INT_LEAST32_MAX__ 2147483647 2025-05-07T19:46:54.5938091Z #define __INT_LEAST32_TYPE__ int 2025-05-07T19:46:54.5938195Z #define __INT_LEAST32_WIDTH__ 32 2025-05-07T19:46:54.5938303Z #define __INT_LEAST64_FMTd__ "ld" 2025-05-07T19:46:54.5938397Z #define __INT_LEAST64_FMTi__ "li" 2025-05-07T19:46:54.5938514Z #define __INT_LEAST64_MAX__ 9223372036854775807L 2025-05-07T19:46:54.5938626Z #define __INT_LEAST64_TYPE__ long int 2025-05-07T19:46:54.5938717Z #define __INT_LEAST64_WIDTH__ 64 2025-05-07T19:46:54.5938807Z #define __INT_LEAST8_FMTd__ "hhd" 2025-05-07T19:46:54.5938908Z #define __INT_LEAST8_FMTi__ "hhi" 2025-05-07T19:46:54.5939002Z #define __INT_LEAST8_MAX__ 127 2025-05-07T19:46:54.5939102Z #define __INT_LEAST8_TYPE__ signed char 2025-05-07T19:46:54.5939193Z #define __INT_LEAST8_WIDTH__ 8 2025-05-07T19:46:54.5939299Z #define __INT_MAX__ 2147483647 2025-05-07T19:46:54.5939385Z #define __INT_WIDTH__ 32 2025-05-07T19:46:54.5939478Z #define __KERNEL_STRICT_NAMES 2025-05-07T19:46:54.5939643Z #define __KEY_T_TYPE __S32_TYPE 2025-05-07T19:46:54.5939738Z #define __LDBL_DECIMAL_DIG__ 21 2025-05-07T19:46:54.5939881Z #define __LDBL_DENORM_MIN__ 3.64519953188247460253e-4951L 2025-05-07T19:46:54.5939967Z #define __LDBL_DIG__ 18 2025-05-07T19:46:54.5940106Z #define __LDBL_EPSILON__ 1.08420217248550443401e-19L 2025-05-07T19:46:54.5940199Z #define __LDBL_HAS_DENORM__ 1 2025-05-07T19:46:54.5940291Z #define __LDBL_HAS_INFINITY__ 1 2025-05-07T19:46:54.5940399Z #define __LDBL_HAS_QUIET_NAN__ 1 2025-05-07T19:46:54.5940494Z #define __LDBL_MANT_DIG__ 64 2025-05-07T19:46:54.5940592Z #define __LDBL_MAX_10_EXP__ 4932 2025-05-07T19:46:54.5940689Z #define __LDBL_MAX_EXP__ 16384 2025-05-07T19:46:54.5940830Z #define __LDBL_MAX__ 1.18973149535723176502e+4932L 2025-05-07T19:46:54.5940943Z #define __LDBL_MIN_10_EXP__ (-4931) 2025-05-07T19:46:54.5941060Z #define __LDBL_MIN_EXP__ (-16381) 2025-05-07T19:46:54.5941217Z #define __LDBL_MIN__ 3.36210314311209350626e-4932L 2025-05-07T19:46:54.5941361Z #define __LDBL_REDIR(name,proto) name proto 2025-05-07T19:46:54.5941510Z #define __LDBL_REDIR1(name,proto,alias) name proto 2025-05-07T19:46:54.5941728Z #define __LDBL_REDIR1_NTH(name,proto,alias) name proto __THROW 2025-05-07T19:46:54.5941842Z #define __LDBL_REDIR_DECL(name) 2025-05-07T19:46:54.5942007Z #define __LDBL_REDIR_NTH(name,proto) name proto __THROW 2025-05-07T19:46:54.5942107Z #define __LEAF 2025-05-07T19:46:54.5942240Z #define __LEAF_ATTR 2025-05-07T19:46:54.5942349Z #define __LIBRARY_TYPES_H__ 2025-05-07T19:46:54.5942456Z #define __LITTLE_ENDIAN 1234 2025-05-07T19:46:54.5942589Z #define __LITTLE_ENDIAN__ 1 2025-05-07T19:46:54.5942696Z #define __LLONG_WIDTH__ 64 2025-05-07T19:46:54.5942827Z #define __LONG_LONG_MAX__ 9223372036854775807LL 2025-05-07T19:46:54.5942948Z #define __LONG_LONG_PAIR(HI,LO) LO, HI 2025-05-07T19:46:54.5943096Z #define __LONG_MAX__ 9223372036854775807L 2025-05-07T19:46:54.5943202Z #define __LONG_WIDTH__ 64 2025-05-07T19:46:54.5943300Z #define __LP64__ 1 2025-05-07T19:46:54.5943759Z #define __MATHCALLX(function,suffix,args,attrib) __MATHDECLX (_Mdouble_,function,suffix, args, attrib) 2025-05-07T19:46:54.5944448Z #define __MATHDECLX(type,function,suffix,args,attrib) __MATHDECL_1(type, function,suffix, args) __attribute__ (attrib); __MATHDECL_1(type, __CONCAT(__,function),suffix, args) __attribute__ (attrib) 2025-05-07T19:46:54.5944569Z #define __MATH_DECLARE_LDOUBLE 1 2025-05-07T19:46:54.5944720Z #define __MATH_FUNCTIONS_HPP__ 2025-05-07T19:46:54.5944836Z #define __MATH_FUNCTIONS_H__ 2025-05-07T19:46:54.5944936Z #define __MMX__ 1 2025-05-07T19:46:54.5945051Z #define __MODE_T_TYPE __U32_TYPE 2025-05-07T19:46:54.5945192Z #define __N(msgid) (msgid) 2025-05-07T19:46:54.5945336Z #define __NFDBITS (8 * (int) sizeof (__fd_mask)) 2025-05-07T19:46:54.5945475Z #define __NLINK_T_TYPE __SYSCALL_ULONG_TYPE 2025-05-07T19:46:54.5945611Z #define __NO_CTYPE 1 2025-05-07T19:46:54.5945718Z #define __NO_INLINE__ 1 2025-05-07T19:46:54.5945828Z #define __NO_MATH_INLINES 1 2025-05-07T19:46:54.5945949Z #define __NTH(fct) __LEAF_ATTR fct throw () 2025-05-07T19:46:54.5946078Z #define __NVCC_DIAG_PRAGMA_SUPPORT__ 1 2025-05-07T19:46:54.5946161Z #define __NVCC__ 1 2025-05-07T19:46:54.5946262Z #define __NV_GLIBCXX_VERSION 40800 2025-05-07T19:46:54.5946372Z #define __NV_LEGACY_LAUNCH 1 2025-05-07T19:46:54.5946476Z #define __NV_NO_HOST_COMPILER_CHECK 1 2025-05-07T19:46:54.5946571Z #define __OBJC_BOOL_IS_BOOL 0 2025-05-07T19:46:54.5946670Z #define __OFF64_T_TYPE __SQUAD_TYPE 2025-05-07T19:46:54.5946782Z #define __OFF_T_MATCHES_OFF64_T 1 2025-05-07T19:46:54.5946892Z #define __OFF_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:54.5947020Z #define __OPENCL_MEMORY_SCOPE_ALL_SVM_DEVICES 3 2025-05-07T19:46:54.5947138Z #define __OPENCL_MEMORY_SCOPE_DEVICE 2 2025-05-07T19:46:54.5947247Z #define __OPENCL_MEMORY_SCOPE_SUB_GROUP 4 2025-05-07T19:46:54.5947358Z #define __OPENCL_MEMORY_SCOPE_WORK_GROUP 1 2025-05-07T19:46:54.5947468Z #define __OPENCL_MEMORY_SCOPE_WORK_ITEM 0 2025-05-07T19:46:54.5947585Z #define __ORDER_BIG_ENDIAN__ 4321 2025-05-07T19:46:54.5947739Z #define __ORDER_LITTLE_ENDIAN__ 1234 2025-05-07T19:46:54.5947839Z #define __ORDER_PDP_ENDIAN__ 3412 2025-05-07T19:46:54.5947942Z #define __P(args) args 2025-05-07T19:46:54.5948035Z #define __PDP_ENDIAN 3412 2025-05-07T19:46:54.5948118Z #define __PIC__ 2 2025-05-07T19:46:54.5948217Z #define __PID_T_TYPE __S32_TYPE 2025-05-07T19:46:54.5948314Z #define __PIE__ 2 2025-05-07T19:46:54.5948405Z #define __PMT(args) args 2025-05-07T19:46:54.5948501Z #define __POINTER_WIDTH__ 64 2025-05-07T19:46:54.5948618Z #define __PRAGMA_REDEFINE_EXTNAME 1 2025-05-07T19:46:54.5948719Z #define __PTHREAD_MUTEX_HAVE_PREV 1 2025-05-07T19:46:54.5948834Z #define __PTHREAD_RWLOCK_INT_FLAGS_SHARED 1 2025-05-07T19:46:54.5948930Z #define __PTHREAD_SPINS 0, 0 2025-05-07T19:46:54.5949041Z #define __PTRDIFF_FMTd__ "ld" 2025-05-07T19:46:54.5949140Z #define __PTRDIFF_FMTi__ "li" 2025-05-07T19:46:54.5949250Z #define __PTRDIFF_MAX__ 9223372036854775807L 2025-05-07T19:46:54.5949476Z #define __PTRDIFF_TYPE__ long int 2025-05-07T19:46:54.5949570Z #define __PTRDIFF_WIDTH__ 64 2025-05-07T19:46:54.5949782Z #define __REDIRECT(name,proto,alias) name proto __asm__ (__ASMNAME (#alias)) 2025-05-07T19:46:54.5949998Z #define __REDIRECT_LDBL(name,proto,alias) __REDIRECT (name, proto, alias) 2025-05-07T19:46:54.5950240Z #define __REDIRECT_NTH(name,proto,alias) name proto __THROW __asm__ (__ASMNAME (#alias)) 2025-05-07T19:46:54.5950501Z #define __REDIRECT_NTHNL(name,proto,alias) name proto __THROWNL __asm__ (__ASMNAME (#alias)) 2025-05-07T19:46:54.5950729Z #define __REDIRECT_NTH_LDBL(name,proto,alias) __REDIRECT_NTH (name, proto, alias) 2025-05-07T19:46:54.5950833Z #define __REGISTER_PREFIX__ 2025-05-07T19:46:54.5950928Z #define __RLIM64_T_TYPE __UQUAD_TYPE 2025-05-07T19:46:54.5951032Z #define __RLIM_T_TYPE __SYSCALL_ULONG_TYPE 2025-05-07T19:46:54.5951133Z #define __S16_TYPE short int 2025-05-07T19:46:54.5951214Z #define __S32_TYPE int 2025-05-07T19:46:54.5951299Z #define __S64_TYPE long int 2025-05-07T19:46:54.5951442Z #define __SCHAR_MAX__ 127 2025-05-07T19:46:54.5951538Z #define __SEG_FS 1 2025-05-07T19:46:54.5951615Z #define __SEG_GS 1 2025-05-07T19:46:54.5951701Z #define __SHRT_MAX__ 32767 2025-05-07T19:46:54.5951800Z #define __SHRT_WIDTH__ 16 2025-05-07T19:46:54.5951894Z #define __SIG_ATOMIC_MAX__ 2147483647 2025-05-07T19:46:54.5951984Z #define __SIG_ATOMIC_WIDTH__ 32 2025-05-07T19:46:54.5952071Z #define __SIZEOF_DOUBLE__ 8 2025-05-07T19:46:54.5952176Z #define __SIZEOF_FLOAT128__ 16 2025-05-07T19:46:54.5952262Z #define __SIZEOF_FLOAT__ 4 2025-05-07T19:46:54.5952349Z #define __SIZEOF_INT128__ 16 2025-05-07T19:46:54.5952446Z #define __SIZEOF_INT__ 4 2025-05-07T19:46:54.5952540Z #define __SIZEOF_LONG_DOUBLE__ 16 2025-05-07T19:46:54.5952630Z #define __SIZEOF_LONG_LONG__ 8 2025-05-07T19:46:54.5952717Z #define __SIZEOF_LONG__ 8 2025-05-07T19:46:54.5952822Z #define __SIZEOF_POINTER__ 8 2025-05-07T19:46:54.5952920Z #define __SIZEOF_PTHREAD_ATTR_T 56 2025-05-07T19:46:54.5953024Z #define __SIZEOF_PTHREAD_BARRIERATTR_T 4 2025-05-07T19:46:54.5953148Z #define __SIZEOF_PTHREAD_BARRIER_T 32 2025-05-07T19:46:54.5953247Z #define __SIZEOF_PTHREAD_CONDATTR_T 4 2025-05-07T19:46:54.5953341Z #define __SIZEOF_PTHREAD_COND_T 48 2025-05-07T19:46:54.5953443Z #define __SIZEOF_PTHREAD_MUTEXATTR_T 4 2025-05-07T19:46:54.5953551Z #define __SIZEOF_PTHREAD_MUTEX_T 40 2025-05-07T19:46:54.5953652Z #define __SIZEOF_PTHREAD_RWLOCKATTR_T 8 2025-05-07T19:46:54.5953745Z #define __SIZEOF_PTHREAD_RWLOCK_T 56 2025-05-07T19:46:54.5953852Z #define __SIZEOF_PTRDIFF_T__ 8 2025-05-07T19:46:54.5953939Z #define __SIZEOF_SHORT__ 2 2025-05-07T19:46:54.5954026Z #define __SIZEOF_SIZE_T__ 8 2025-05-07T19:46:54.5954127Z #define __SIZEOF_WCHAR_T__ 4 2025-05-07T19:46:54.5954212Z #define __SIZEOF_WINT_T__ 4 2025-05-07T19:46:54.5954299Z #define __SIZE_FMTX__ "lX" 2025-05-07T19:46:54.5954384Z #define __SIZE_FMTo__ "lo" 2025-05-07T19:46:54.5954486Z #define __SIZE_FMTu__ "lu" 2025-05-07T19:46:54.5954571Z #define __SIZE_FMTx__ "lx" 2025-05-07T19:46:54.5954670Z #define __SIZE_MAX__ 18446744073709551615UL 2025-05-07T19:46:54.5954832Z #define __SIZE_TYPE__ long unsigned int 2025-05-07T19:46:54.5954918Z #define __SIZE_WIDTH__ 64 2025-05-07T19:46:54.5955006Z #define __SLONG32_TYPE int 2025-05-07T19:46:54.5955098Z #define __SLONGWORD_TYPE long int 2025-05-07T19:46:54.5955196Z #define __SM_100_RT_HPP__ 2025-05-07T19:46:54.5955276Z #define __SM_100_RT_H__ 2025-05-07T19:46:54.5955373Z #define __SM_20_ATOMIC_FUNCTIONS_HPP__ 2025-05-07T19:46:54.5955483Z #define __SM_20_ATOMIC_FUNCTIONS_H__ 2025-05-07T19:46:54.5955574Z #define __SM_20_INTRINSICS_HPP__ 2025-05-07T19:46:54.5955663Z #define __SM_20_INTRINSICS_H__ 2025-05-07T19:46:54.5955753Z #define __SM_30_INTRINSICS_HPP__ 2025-05-07T19:46:54.5955856Z #define __SM_30_INTRINSICS_H__ 2025-05-07T19:46:54.5955953Z #define __SM_32_ATOMIC_FUNCTIONS_HPP__ 2025-05-07T19:46:54.5956046Z #define __SM_32_ATOMIC_FUNCTIONS_H__ 2025-05-07T19:46:54.5956152Z #define __SM_32_INTRINSICS_HPP__ 2025-05-07T19:46:54.5956241Z #define __SM_32_INTRINSICS_H__ 2025-05-07T19:46:54.5956340Z #define __SM_35_ATOMIC_FUNCTIONS_H__ 2025-05-07T19:46:54.5956430Z #define __SM_35_INTRINSICS_H__ 2025-05-07T19:46:54.5956541Z #define __SM_60_ATOMIC_FUNCTIONS_HPP__ 2025-05-07T19:46:54.5956637Z #define __SM_60_ATOMIC_FUNCTIONS_H__ 2025-05-07T19:46:54.5956724Z #define __SM_61_INTRINSICS_HPP__ 2025-05-07T19:46:54.5956826Z #define __SM_61_INTRINSICS_H__ 2025-05-07T19:46:54.5956912Z #define __SM_70_RT_HPP__ 2025-05-07T19:46:54.5956993Z #define __SM_70_RT_H__ 2025-05-07T19:46:54.5957074Z #define __SM_80_RT_HPP__ 2025-05-07T19:46:54.5957170Z #define __SM_80_RT_H__ 2025-05-07T19:46:54.5957253Z #define __SM_90_RT_HPP__ 2025-05-07T19:46:54.5957332Z #define __SM_90_RT_H__ 2025-05-07T19:46:54.5957432Z #define __SQUAD_TYPE long int 2025-05-07T19:46:54.5957516Z #define __SSE2_MATH__ 1 2025-05-07T19:46:54.5957596Z #define __SSE2__ 1 2025-05-07T19:46:54.5957672Z #define __SSE_MATH__ 1 2025-05-07T19:46:54.5957760Z #define __SSE__ 1 2025-05-07T19:46:54.5957855Z #define __SSIZE_T_TYPE __SWORD_TYPE 2025-05-07T19:46:54.5958033Z #define __STDCPP_DEFAULT_NEW_ALIGNMENT__ 16UL 2025-05-07T19:46:54.5958150Z #define __STDCPP_MATH_SPEC_FUNCS__ 201003L 2025-05-07T19:46:54.5958240Z #define __STDCPP_THREADS__ 1 2025-05-07T19:46:54.5958323Z #define __STDC_HOSTED__ 1 2025-05-07T19:46:54.5958411Z #define __STDC_IEC_559_COMPLEX__ 1 2025-05-07T19:46:54.5958514Z #define __STDC_IEC_559__ 1 2025-05-07T19:46:54.5958604Z #define __STDC_ISO_10646__ 201103L 2025-05-07T19:46:54.5958694Z #define __STDC_NO_THREADS__ 1 2025-05-07T19:46:54.5958785Z #define __STDC_UTF_16__ 1 2025-05-07T19:46:54.5958871Z #define __STDC_UTF_32__ 1 2025-05-07T19:46:54.5958948Z #define __STDC__ 1 2025-05-07T19:46:54.5959025Z #define __STDDEF_H 2025-05-07T19:46:54.5959119Z #define __STRING(x) #x 2025-05-07T19:46:54.5959226Z #define __SURFACE_INDIRECT_FUNCTIONS_H__ 2025-05-07T19:46:54.5959316Z #define __SURFACE_TYPES_H__ 2025-05-07T19:46:54.5959445Z #define __SUSECONDS_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:54.5959536Z #define __SWORD_TYPE long int 2025-05-07T19:46:54.5959651Z #define __SYSCALL_SLONG_TYPE __SLONGWORD_TYPE 2025-05-07T19:46:54.5959760Z #define __SYSCALL_ULONG_TYPE __ULONGWORD_TYPE 2025-05-07T19:46:54.5959859Z #define __SYSCALL_WORDSIZE 64 2025-05-07T19:46:54.5959961Z #define __TEXTURE_INDIRECT_FUNCTIONS_H__ 2025-05-07T19:46:54.5960050Z #define __TEXTURE_TYPES_H__ 2025-05-07T19:46:54.5960141Z #define __THROW throw () 2025-05-07T19:46:54.5960225Z #define __THROWNL throw () 2025-05-07T19:46:54.5960311Z #define __TIMER_T_TYPE void * 2025-05-07T19:46:54.5960413Z #define __TIME_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:54.5960514Z #define __U16_TYPE unsigned short int 2025-05-07T19:46:54.5960602Z #define __U32_TYPE unsigned int 2025-05-07T19:46:54.5960694Z #define __U64_TYPE unsigned long int 2025-05-07T19:46:54.5960791Z #define __UID_T_TYPE __U32_TYPE 2025-05-07T19:46:54.5960874Z #define __UINT16_C_SUFFIX__ 2025-05-07T19:46:54.5960959Z #define __UINT16_FMTX__ "hX" 2025-05-07T19:46:54.5961041Z #define __UINT16_FMTo__ "ho" 2025-05-07T19:46:54.5961187Z #define __UINT16_FMTu__ "hu" 2025-05-07T19:46:54.5961273Z #define __UINT16_FMTx__ "hx" 2025-05-07T19:46:54.5961356Z #define __UINT16_MAX__ 65535 2025-05-07T19:46:54.5961465Z #define __UINT16_TYPE__ unsigned short 2025-05-07T19:46:54.5961550Z #define __UINT32_C_SUFFIX__ U 2025-05-07T19:46:54.5961636Z #define __UINT32_FMTX__ "X" 2025-05-07T19:46:54.5961721Z #define __UINT32_FMTo__ "o" 2025-05-07T19:46:54.5961813Z #define __UINT32_FMTu__ "u" 2025-05-07T19:46:54.5961896Z #define __UINT32_FMTx__ "x" 2025-05-07T19:46:54.5961985Z #define __UINT32_MAX__ 4294967295U 2025-05-07T19:46:54.5962084Z #define __UINT32_TYPE__ unsigned int 2025-05-07T19:46:54.5962168Z #define __UINT64_C_SUFFIX__ UL 2025-05-07T19:46:54.5962259Z #define __UINT64_FMTX__ "lX" 2025-05-07T19:46:54.5962348Z #define __UINT64_FMTo__ "lo" 2025-05-07T19:46:54.5962441Z #define __UINT64_FMTu__ "lu" 2025-05-07T19:46:54.5962524Z #define __UINT64_FMTx__ "lx" 2025-05-07T19:46:54.5962626Z #define __UINT64_MAX__ 18446744073709551615UL 2025-05-07T19:46:54.5962768Z #define __UINT64_TYPE__ long unsigned int 2025-05-07T19:46:54.5962858Z #define __UINT8_C_SUFFIX__ 2025-05-07T19:46:54.5962958Z #define __UINT8_FMTX__ "hhX" 2025-05-07T19:46:54.5963051Z #define __UINT8_FMTo__ "hho" 2025-05-07T19:46:54.5963161Z #define __UINT8_FMTu__ "hhu" 2025-05-07T19:46:54.5963255Z #define __UINT8_FMTx__ "hhx" 2025-05-07T19:46:54.5963348Z #define __UINT8_MAX__ 255 2025-05-07T19:46:54.5963477Z #define __UINT8_TYPE__ unsigned char 2025-05-07T19:46:54.5963567Z #define __UINTMAX_C_SUFFIX__ UL 2025-05-07T19:46:54.5963669Z #define __UINTMAX_FMTX__ "lX" 2025-05-07T19:46:54.5963768Z #define __UINTMAX_FMTo__ "lo" 2025-05-07T19:46:54.5963886Z #define __UINTMAX_FMTu__ "lu" 2025-05-07T19:46:54.5963983Z #define __UINTMAX_FMTx__ "lx" 2025-05-07T19:46:54.5964103Z #define __UINTMAX_MAX__ 18446744073709551615UL 2025-05-07T19:46:54.5964240Z #define __UINTMAX_TYPE__ long unsigned int 2025-05-07T19:46:54.5964326Z #define __UINTMAX_WIDTH__ 64 2025-05-07T19:46:54.5964468Z #define __UINTPTR_FMTX__ "lX" 2025-05-07T19:46:54.5964573Z #define __UINTPTR_FMTo__ "lo" 2025-05-07T19:46:54.5964660Z #define __UINTPTR_FMTu__ "lu" 2025-05-07T19:46:54.5964749Z #define __UINTPTR_FMTx__ "lx" 2025-05-07T19:46:54.5964854Z #define __UINTPTR_MAX__ 18446744073709551615UL 2025-05-07T19:46:54.5964971Z #define __UINTPTR_TYPE__ long unsigned int 2025-05-07T19:46:54.5965060Z #define __UINTPTR_WIDTH__ 64 2025-05-07T19:46:54.5965148Z #define __UINT_FAST16_FMTX__ "hX" 2025-05-07T19:46:54.5965247Z #define __UINT_FAST16_FMTo__ "ho" 2025-05-07T19:46:54.5965500Z #define __UINT_FAST16_FMTu__ "hu" 2025-05-07T19:46:54.5965591Z #define __UINT_FAST16_FMTx__ "hx" 2025-05-07T19:46:54.5965679Z #define __UINT_FAST16_MAX__ 65535 2025-05-07T19:46:54.5965804Z #define __UINT_FAST16_TYPE__ unsigned short 2025-05-07T19:46:54.5965898Z #define __UINT_FAST32_FMTX__ "X" 2025-05-07T19:46:54.5965988Z #define __UINT_FAST32_FMTo__ "o" 2025-05-07T19:46:54.5966096Z #define __UINT_FAST32_FMTu__ "u" 2025-05-07T19:46:54.5966190Z #define __UINT_FAST32_FMTx__ "x" 2025-05-07T19:46:54.5966291Z #define __UINT_FAST32_MAX__ 4294967295U 2025-05-07T19:46:54.5966393Z #define __UINT_FAST32_TYPE__ unsigned int 2025-05-07T19:46:54.5966501Z #define __UINT_FAST64_FMTX__ "lX" 2025-05-07T19:46:54.5966590Z #define __UINT_FAST64_FMTo__ "lo" 2025-05-07T19:46:54.5966681Z #define __UINT_FAST64_FMTu__ "lu" 2025-05-07T19:46:54.5966785Z #define __UINT_FAST64_FMTx__ "lx" 2025-05-07T19:46:54.5966903Z #define __UINT_FAST64_MAX__ 18446744073709551615UL 2025-05-07T19:46:54.5967020Z #define __UINT_FAST64_TYPE__ long unsigned int 2025-05-07T19:46:54.5967109Z #define __UINT_FAST8_FMTX__ "hhX" 2025-05-07T19:46:54.5967212Z #define __UINT_FAST8_FMTo__ "hho" 2025-05-07T19:46:54.5967301Z #define __UINT_FAST8_FMTu__ "hhu" 2025-05-07T19:46:54.5967531Z #define __UINT_FAST8_FMTx__ "hhx" 2025-05-07T19:46:54.5967638Z #define __UINT_FAST8_MAX__ 255 2025-05-07T19:46:54.5967910Z #define __UINT_FAST8_TYPE__ unsigned char 2025-05-07T19:46:54.5968007Z #define __UINT_LEAST16_FMTX__ "hX" 2025-05-07T19:46:54.5968163Z #define __UINT_LEAST16_FMTo__ "ho" 2025-05-07T19:46:54.5968276Z #define __UINT_LEAST16_FMTu__ "hu" 2025-05-07T19:46:54.5968373Z #define __UINT_LEAST16_FMTx__ "hx" 2025-05-07T19:46:54.5968462Z #define __UINT_LEAST16_MAX__ 65535 2025-05-07T19:46:54.5968588Z #define __UINT_LEAST16_TYPE__ unsigned short 2025-05-07T19:46:54.5968682Z #define __UINT_LEAST32_FMTX__ "X" 2025-05-07T19:46:54.5968777Z #define __UINT_LEAST32_FMTo__ "o" 2025-05-07T19:46:54.5968882Z #define __UINT_LEAST32_FMTu__ "u" 2025-05-07T19:46:54.5968985Z #define __UINT_LEAST32_FMTx__ "x" 2025-05-07T19:46:54.5969092Z #define __UINT_LEAST32_MAX__ 4294967295U 2025-05-07T19:46:54.5969204Z #define __UINT_LEAST32_TYPE__ unsigned int 2025-05-07T19:46:54.5969330Z #define __UINT_LEAST64_FMTX__ "lX" 2025-05-07T19:46:54.5969428Z #define __UINT_LEAST64_FMTo__ "lo" 2025-05-07T19:46:54.5969520Z #define __UINT_LEAST64_FMTu__ "lu" 2025-05-07T19:46:54.5969626Z #define __UINT_LEAST64_FMTx__ "lx" 2025-05-07T19:46:54.5969752Z #define __UINT_LEAST64_MAX__ 18446744073709551615UL 2025-05-07T19:46:54.5969884Z #define __UINT_LEAST64_TYPE__ long unsigned int 2025-05-07T19:46:54.5969978Z #define __UINT_LEAST8_FMTX__ "hhX" 2025-05-07T19:46:54.5970083Z #define __UINT_LEAST8_FMTo__ "hho" 2025-05-07T19:46:54.5970178Z #define __UINT_LEAST8_FMTu__ "hhu" 2025-05-07T19:46:54.5970271Z #define __UINT_LEAST8_FMTx__ "hhx" 2025-05-07T19:46:54.5970376Z #define __UINT_LEAST8_MAX__ 255 2025-05-07T19:46:54.5970488Z #define __UINT_LEAST8_TYPE__ unsigned char 2025-05-07T19:46:54.5970582Z #define __ULONG32_TYPE unsigned int 2025-05-07T19:46:54.5970691Z #define __ULONGWORD_TYPE unsigned long int 2025-05-07T19:46:54.5970805Z #define __UQUAD_TYPE unsigned long int 2025-05-07T19:46:54.5970905Z #define __USECONDS_T_TYPE __U32_TYPE 2025-05-07T19:46:54.5971000Z #define __USER_LABEL_PREFIX__ 2025-05-07T19:46:54.5971097Z #define __USE_ANSI 1 2025-05-07T19:46:54.5971181Z #define __USE_ATFILE 1 2025-05-07T19:46:54.5971264Z #define __USE_BSD 1 2025-05-07T19:46:54.5971354Z #define __USE_FORTIFY_LEVEL 0 2025-05-07T19:46:54.5971568Z #define __USE_GNU 1 2025-05-07T19:46:54.5971652Z #define __USE_ISOC11 1 2025-05-07T19:46:54.5971735Z #define __USE_ISOC95 1 2025-05-07T19:46:54.5971831Z #define __USE_ISOC99 1 2025-05-07T19:46:54.5971921Z #define __USE_ISOCXX11 1 2025-05-07T19:46:54.5972012Z #define __USE_LARGEFILE 1 2025-05-07T19:46:54.5972103Z #define __USE_LARGEFILE64 1 2025-05-07T19:46:54.5972197Z #define __USE_MISC 1 2025-05-07T19:46:54.5972284Z #define __USE_POSIX 1 2025-05-07T19:46:54.5972375Z #define __USE_POSIX199309 1 2025-05-07T19:46:54.5972481Z #define __USE_POSIX199506 1 2025-05-07T19:46:54.5972569Z #define __USE_POSIX2 1 2025-05-07T19:46:54.5972652Z #define __USE_SVID 1 2025-05-07T19:46:54.5972733Z #define __USE_UNIX98 1 2025-05-07T19:46:54.5972829Z #define __USE_XOPEN 1 2025-05-07T19:46:54.5972917Z #define __USE_XOPEN2K 1 2025-05-07T19:46:54.5973010Z #define __USE_XOPEN2K8 1 2025-05-07T19:46:54.5973110Z #define __USE_XOPEN2K8XSI 1 2025-05-07T19:46:54.5973200Z #define __USE_XOPEN2KXSI 1 2025-05-07T19:46:54.5973301Z #define __USE_XOPEN_EXTENDED 1 2025-05-07T19:46:54.5973402Z #define __USING_NAMESPACE_C99(name) 2025-05-07T19:46:54.5973515Z #define __USING_NAMESPACE_STD(name) 2025-05-07T19:46:54.5973618Z #define __UWORD_TYPE unsigned long int 2025-05-07T19:46:54.5973717Z #define __VECTOR_FUNCTIONS_HPP__ 2025-05-07T19:46:54.5973820Z #define __VECTOR_FUNCTIONS_H__ 2025-05-07T19:46:54.5973910Z #define __VECTOR_TYPES_H__ 2025-05-07T19:46:54.5974363Z #define __VERSION__ "Clang 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4)" 2025-05-07T19:46:54.5974489Z #define __WAIT_INT(status) (*(int *) &(status)) 2025-05-07T19:46:54.5974594Z #define __WAIT_STATUS void * 2025-05-07T19:46:54.5974692Z #define __WAIT_STATUS_DEFN void * 2025-05-07T19:46:54.5974781Z #define __WALL 0x40000000 2025-05-07T19:46:54.5974881Z #define __WCHAR_MAX__ 2147483647 2025-05-07T19:46:54.5974970Z #define __WCHAR_TYPE__ int 2025-05-07T19:46:54.5975060Z #define __WCHAR_WIDTH__ 32 2025-05-07T19:46:54.5975228Z #define __WCLONE 0x80000000 2025-05-07T19:46:54.5975377Z #define __WCOREDUMP(status) ((status) & __WCOREFLAG) 2025-05-07T19:46:54.5975470Z #define __WCOREFLAG 0x80 2025-05-07T19:46:54.5975619Z #define __WEXITSTATUS(status) (((status) & 0xff00) >> 8) 2025-05-07T19:46:54.5975784Z #define __WIFCONTINUED(status) ((status) == __W_CONTINUED) 2025-05-07T19:46:54.5975926Z #define __WIFEXITED(status) (__WTERMSIG(status) == 0) 2025-05-07T19:46:54.5976156Z #define __WIFSIGNALED(status) (((signed char) (((status) & 0x7f) + 1) >> 1) > 0) 2025-05-07T19:46:54.5976317Z #define __WIFSTOPPED(status) (((status) & 0xff) == 0x7f) 2025-05-07T19:46:54.5976411Z #define __WINT_MAX__ 4294967295U 2025-05-07T19:46:54.5976512Z #define __WINT_TYPE__ unsigned int 2025-05-07T19:46:54.5976606Z #define __WINT_UNSIGNED__ 1 2025-05-07T19:46:54.5976709Z #define __WINT_WIDTH__ 32 2025-05-07T19:46:54.5976804Z #define __WNOTHREAD 0x20000000 2025-05-07T19:46:54.5976890Z #define __WORDSIZE 64 2025-05-07T19:46:54.5977013Z #define __WORDSIZE_TIME64_COMPAT32 1 2025-05-07T19:46:54.5977224Z #define __WSTOPSIG(status) __WEXITSTATUS(status) 2025-05-07T19:46:54.5977344Z #define __WTERMSIG(status) ((status) & 0x7f) 2025-05-07T19:46:54.5977440Z #define __W_CONTINUED 0xffff 2025-05-07T19:46:54.5977585Z #define __W_EXITCODE(ret,sig) ((ret) << 8 | (sig)) 2025-05-07T19:46:54.5977699Z #define __W_STOPCODE(sig) ((sig) << 8 | 0x7f) 2025-05-07T19:46:54.5977793Z #define ____FILE_defined 1 2025-05-07T19:46:54.5977908Z #define ____mbstate_t_defined 1 2025-05-07T19:46:54.5978031Z #define __align__(n) __attribute__((aligned(n))) 2025-05-07T19:46:54.5978223Z #define __always_inline __inline __attribute__ ((__always_inline__)) 2025-05-07T19:46:54.5978307Z #define __amd64 1 2025-05-07T19:46:54.5978404Z #define __amd64__ 1 2025-05-07T19:46:54.5978511Z #define __annotate__(a) __attribute__((a)) 2025-05-07T19:46:54.5978610Z #define __attribute_artificial__ 2025-05-07T19:46:54.5978770Z #define __attribute_const__ __attribute__ ((__const__)) 2025-05-07T19:46:54.5979019Z #define __attribute_deprecated__ __attribute__ ((__deprecated__)) 2025-05-07T19:46:54.5979228Z #define __attribute_format_arg__(x) __attribute__ ((__format_arg__ (x))) 2025-05-07T19:46:54.5979512Z #define __attribute_format_strfmon__(a,b) __attribute__ ((__format__ (__strfmon__, a, b))) 2025-05-07T19:46:54.5979667Z #define __attribute_malloc__ __attribute__ ((__malloc__)) 2025-05-07T19:46:54.5979836Z #define __attribute_noinline__ __attribute__ ((__noinline__)) 2025-05-07T19:46:54.5979973Z #define __attribute_pure__ __attribute__ ((__pure__)) 2025-05-07T19:46:54.5980126Z #define __attribute_used__ __attribute__ ((__used__)) 2025-05-07T19:46:54.5980368Z #define __attribute_warn_unused_result__ __attribute__ ((__warn_unused_result__)) 2025-05-07T19:46:54.5980465Z #define __blkcnt_t_defined 2025-05-07T19:46:54.5980577Z #define __blksize_t_defined 2025-05-07T19:46:54.5980777Z #define __bos(ptr) __builtin_object_size (ptr, __USE_FORTIFY_LEVEL > 1) 2025-05-07T19:46:54.5980907Z #define __bos0(ptr) __builtin_object_size (ptr, 0) 2025-05-07T19:46:54.5981011Z #define __bounded 2025-05-07T19:46:54.5981657Z #define __bswap_16(x) (__extension__ ({ unsigned short int __v, __x = (unsigned short int) (x); if (__builtin_constant_p (__x)) __v = __bswap_constant_16 (__x); else __asm__ ("rorw $8, %w0" : "=r" (__v) : "0" (__x) : "cc"); __v; })) 2025-05-07T19:46:54.5982167Z #define __bswap_32(x) (__extension__ ({ unsigned int __v, __x = (x); if (__builtin_constant_p (__x)) __v = __bswap_constant_32 (__x); else __asm__ ("bswap %0" : "=r" (__v) : "0" (__x)); __v; })) 2025-05-07T19:46:54.5982673Z #define __bswap_64(x) (__extension__ ({ __uint64_t __v, __x = (x); if (__builtin_constant_p (__x)) __v = __bswap_constant_64 (__x); else __asm__ ("bswap %q0" : "=r" (__v) : "0" (__x)); __v; })) 2025-05-07T19:46:54.5982948Z #define __bswap_constant_16(x) ((unsigned short int) ((((x) >> 8) & 0xff) | (((x) & 0xff) << 8))) 2025-05-07T19:46:54.5983354Z #define __bswap_constant_32(x) ((((x) & 0xff000000) >> 24) | (((x) & 0x00ff0000) >> 8) | (((x) & 0x0000ff00) << 8) | (((x) & 0x000000ff) << 24)) 2025-05-07T19:46:54.5984388Z #define __bswap_constant_64(x) (__extension__ ((((x) & 0xff00000000000000ull) >> 56) | (((x) & 0x00ff000000000000ull) >> 40) | (((x) & 0x0000ff0000000000ull) >> 24) | (((x) & 0x000000ff00000000ull) >> 8) | (((x) & 0x00000000ff000000ull) << 8) | (((x) & 0x0000000000ff0000ull) << 24) | (((x) & 0x000000000000ff00ull) << 40) | (((x) & 0x00000000000000ffull) << 56))) 2025-05-07T19:46:54.5984501Z #define __builtin_align__(a) __align__(a) 2025-05-07T19:46:54.5984593Z #define __catch(X) catch(X) 2025-05-07T19:46:54.5984686Z #define __cdecl 2025-05-07T19:46:54.5984771Z #define __clang__ 1 2025-05-07T19:46:54.5984883Z #define __clang_literal_encoding__ "UTF-8" 2025-05-07T19:46:54.5984976Z #define __clang_major__ 16 2025-05-07T19:46:54.5985074Z #define __clang_minor__ 0 2025-05-07T19:46:54.5985169Z #define __clang_patchlevel__ 6 2025-05-07T19:46:54.5985604Z #define __clang_version__ "16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4)" 2025-05-07T19:46:54.5985749Z #define __clang_wide_literal_encoding__ "UTF-32" 2025-05-07T19:46:54.5985840Z #define __clock_t_defined 1 2025-05-07T19:46:54.5985932Z #define __clockid_t_defined 1 2025-05-07T19:46:54.5986145Z #define __cluster_dims__(...) __attribute__((cluster_dims(__VA_ARGS__))) 2025-05-07T19:46:54.5986237Z #define __code_model_small__ 1 2025-05-07T19:46:54.5986351Z #define __constant__ __location__(constant) 2025-05-07T19:46:54.5986445Z #define __cplusplus 201703L 2025-05-07T19:46:54.5986567Z #define __cpp_aggregate_bases 201603L 2025-05-07T19:46:54.5986671Z #define __cpp_aggregate_nsdmi 201304L 2025-05-07T19:46:54.5986770Z #define __cpp_alias_templates 200704L 2025-05-07T19:46:54.5986882Z #define __cpp_aligned_new 201606L 2025-05-07T19:46:54.5986980Z #define __cpp_attributes 200809L 2025-05-07T19:46:54.5987079Z #define __cpp_binary_literals 201304L 2025-05-07T19:46:54.5987185Z #define __cpp_capture_star_this 201603L 2025-05-07T19:46:54.5987356Z #define __cpp_constexpr 201603L 2025-05-07T19:46:54.5987471Z #define __cpp_constexpr_in_decltype 201711L 2025-05-07T19:46:54.5987572Z #define __cpp_decltype 200707L 2025-05-07T19:46:54.5987682Z #define __cpp_decltype_auto 201304L 2025-05-07T19:46:54.5987787Z #define __cpp_deduction_guides 201703L 2025-05-07T19:46:54.5987908Z #define __cpp_delegating_constructors 200604L 2025-05-07T19:46:54.5988011Z #define __cpp_digit_separators 201309L 2025-05-07T19:46:54.5988135Z #define __cpp_enumerator_attributes 201411L 2025-05-07T19:46:54.5988236Z #define __cpp_exceptions 199711L 2025-05-07T19:46:54.5988339Z #define __cpp_fold_expressions 201603L 2025-05-07T19:46:54.5988455Z #define __cpp_generic_lambdas 201304L 2025-05-07T19:46:54.5988577Z #define __cpp_guaranteed_copy_elision 201606L 2025-05-07T19:46:54.5988672Z #define __cpp_hex_float 201603L 2025-05-07T19:46:54.5988779Z #define __cpp_if_constexpr 201606L 2025-05-07T19:46:54.5988895Z #define __cpp_impl_destroying_delete 201806L 2025-05-07T19:46:54.5989027Z #define __cpp_inheriting_constructors 201511L 2025-05-07T19:46:54.5989230Z #define __cpp_init_captures 201304L 2025-05-07T19:46:54.5989339Z #define __cpp_initializer_lists 200806L 2025-05-07T19:46:54.5989430Z #define __cpp_inline_variables 201606L 2025-05-07T19:46:54.5989518Z #define __cpp_lambdas 200907L 2025-05-07T19:46:54.5989635Z #define __cpp_lib_addressof_constexpr 201603 2025-05-07T19:46:54.5989736Z #define __cpp_lib_array_constexpr 201803L 2025-05-07T19:46:54.5989827Z #define __cpp_lib_as_const 201510 2025-05-07T19:46:54.5989918Z #define __cpp_lib_bool_constant 201505 2025-05-07T19:46:54.5990027Z #define __cpp_lib_exchange_function 201304 2025-05-07T19:46:54.5990179Z #define __cpp_lib_has_unique_object_representations 201606 2025-05-07T19:46:54.5990269Z #define __cpp_lib_hypot 201603 2025-05-07T19:46:54.5990376Z #define __cpp_lib_integer_sequence 201304 2025-05-07T19:46:54.5990501Z #define __cpp_lib_integral_constant_callable 201304 2025-05-07T19:46:54.5990598Z #define __cpp_lib_is_aggregate 201703 2025-05-07T19:46:54.5990746Z #define __cpp_lib_is_final 201402L 2025-05-07T19:46:54.5990854Z #define __cpp_lib_is_invocable 201703 2025-05-07T19:46:54.5990953Z #define __cpp_lib_is_null_pointer 201309 2025-05-07T19:46:54.5991049Z #define __cpp_lib_is_swappable 201603 2025-05-07T19:46:54.5991147Z #define __cpp_lib_launder 201606 2025-05-07T19:46:54.5991244Z #define __cpp_lib_logical_traits 201510 2025-05-07T19:46:54.5991358Z #define __cpp_lib_make_reverse_iterator 201402 2025-05-07T19:46:54.5991477Z #define __cpp_lib_math_special_functions 201603L 2025-05-07T19:46:54.5991583Z #define __cpp_lib_result_of_sfinae 201210 2025-05-07T19:46:54.5991714Z #define __cpp_lib_robust_nonmodifying_seq_ops 201304 2025-05-07T19:46:54.5991848Z #define __cpp_lib_transformation_trait_aliases 201304 2025-05-07T19:46:54.5991961Z #define __cpp_lib_tuple_element_t 201402L 2025-05-07T19:46:54.5992057Z #define __cpp_lib_tuples_by_type 201304 2025-05-07T19:46:54.5992198Z #define __cpp_lib_type_trait_variable_templates 201510L 2025-05-07T19:46:54.5992314Z #define __cpp_lib_void_t 201411 2025-05-07T19:46:54.5992426Z #define __cpp_named_character_escapes 202207L 2025-05-07T19:46:54.5992531Z #define __cpp_namespace_attributes 201411L 2025-05-07T19:46:54.5992657Z #define __cpp_nested_namespace_definitions 201411L 2025-05-07T19:46:54.5992776Z #define __cpp_noexcept_function_type 201510L 2025-05-07T19:46:54.5992880Z #define __cpp_nontype_template_args 201411L 2025-05-07T19:46:54.5993013Z #define __cpp_nontype_template_parameter_auto 201606L 2025-05-07T19:46:54.5993110Z #define __cpp_nsdmi 200809L 2025-05-07T19:46:54.5993199Z #define __cpp_range_based_for 201603L 2025-05-07T19:46:54.5993290Z #define __cpp_raw_strings 200710L 2025-05-07T19:46:54.5993383Z #define __cpp_ref_qualifiers 200710L 2025-05-07T19:46:54.5993496Z #define __cpp_return_type_deduction 201304L 2025-05-07T19:46:54.5993580Z #define __cpp_rtti 199711L 2025-05-07T19:46:54.5993674Z #define __cpp_rvalue_references 200610L 2025-05-07T19:46:54.5993780Z #define __cpp_static_assert 201411L 2025-05-07T19:46:54.5993938Z #define __cpp_static_call_operator 202207L 2025-05-07T19:46:54.5994041Z #define __cpp_structured_bindings 201606L 2025-05-07T19:46:54.5994149Z #define __cpp_template_auto 201606L 2025-05-07T19:46:54.5994255Z #define __cpp_threadsafe_static_init 200806L 2025-05-07T19:46:54.5994355Z #define __cpp_unicode_characters 200704L 2025-05-07T19:46:54.5994451Z #define __cpp_unicode_literals 200710L 2025-05-07T19:46:54.5994566Z #define __cpp_user_defined_literals 200809L 2025-05-07T19:46:54.5994661Z #define __cpp_variable_templates 201304L 2025-05-07T19:46:54.5994756Z #define __cpp_variadic_templates 200704L 2025-05-07T19:46:54.5994859Z #define __cpp_variadic_using 201611L 2025-05-07T19:46:54.5994961Z #define __cudaCDP2DeviceGetAttribute 2025-05-07T19:46:54.5995066Z #define __cudaCDP2DeviceGetCacheConfig 2025-05-07T19:46:54.5995163Z #define __cudaCDP2DeviceGetLimit 2025-05-07T19:46:54.5995288Z #define __cudaCDP2DeviceGetSharedMemConfig 2025-05-07T19:46:54.5995390Z #define __cudaCDP2EventCreateWithFlags 2025-05-07T19:46:54.5995488Z #define __cudaCDP2EventDestroy 2025-05-07T19:46:54.5995593Z #define __cudaCDP2EventRecord 2025-05-07T19:46:54.5995697Z #define __cudaCDP2EventRecordWithFlags 2025-05-07T19:46:54.5995812Z #define __cudaCDP2EventRecordWithFlags_ptsz 2025-05-07T19:46:54.5995906Z #define __cudaCDP2EventRecord_ptsz 2025-05-07T19:46:54.5996004Z #define __cudaCDP2Free 2025-05-07T19:46:54.5996104Z #define __cudaCDP2FuncGetAttributes 2025-05-07T19:46:54.5996193Z #define __cudaCDP2GetDevice 2025-05-07T19:46:54.5996299Z #define __cudaCDP2GetDeviceCount 2025-05-07T19:46:54.5996391Z #define __cudaCDP2GetErrorName 2025-05-07T19:46:54.5996484Z #define __cudaCDP2GetErrorString 2025-05-07T19:46:54.5996572Z #define __cudaCDP2GetLastError 2025-05-07T19:46:54.5996690Z #define __cudaCDP2GetParameterBuffer 2025-05-07T19:46:54.5996796Z #define __cudaCDP2GetParameterBufferV2 2025-05-07T19:46:54.5996884Z #define __cudaCDP2LaunchDevice 2025-05-07T19:46:54.5996985Z #define __cudaCDP2LaunchDeviceV2 2025-05-07T19:46:54.5997160Z #define __cudaCDP2LaunchDeviceV2_ptsz 2025-05-07T19:46:54.5997259Z #define __cudaCDP2LaunchDevice_ptsz 2025-05-07T19:46:54.5997345Z #define __cudaCDP2Malloc 2025-05-07T19:46:54.5997450Z #define __cudaCDP2Memcpy2DAsync 2025-05-07T19:46:54.5997551Z #define __cudaCDP2Memcpy2DAsync_ptsz 2025-05-07T19:46:54.5997646Z #define __cudaCDP2Memcpy3DAsync 2025-05-07T19:46:54.5997755Z #define __cudaCDP2Memcpy3DAsync_ptsz 2025-05-07T19:46:54.5997849Z #define __cudaCDP2MemcpyAsync 2025-05-07T19:46:54.5997948Z #define __cudaCDP2MemcpyAsync_ptsz 2025-05-07T19:46:54.5998038Z #define __cudaCDP2Memset2DAsync 2025-05-07T19:46:54.5998160Z #define __cudaCDP2Memset2DAsync_ptsz 2025-05-07T19:46:54.5998254Z #define __cudaCDP2Memset3DAsync 2025-05-07T19:46:54.5998352Z #define __cudaCDP2Memset3DAsync_ptsz 2025-05-07T19:46:54.5998452Z #define __cudaCDP2MemsetAsync 2025-05-07T19:46:54.5998548Z #define __cudaCDP2MemsetAsync_ptsz 2025-05-07T19:46:54.5998739Z #define __cudaCDP2OccupancyMaxActiveBlocksPerMultiprocessor 2025-05-07T19:46:54.5998990Z #define __cudaCDP2OccupancyMaxActiveBlocksPerMultiprocessorWithFlags 2025-05-07T19:46:54.5999088Z #define __cudaCDP2PeekAtLastError 2025-05-07T19:46:54.5999189Z #define __cudaCDP2RuntimeGetVersion 2025-05-07T19:46:54.5999297Z #define __cudaCDP2StreamCreateWithFlags 2025-05-07T19:46:54.5999404Z #define __cudaCDP2StreamDestroy 2025-05-07T19:46:54.5999501Z #define __cudaCDP2StreamWaitEvent 2025-05-07T19:46:54.5999607Z #define __cudaCDP2StreamWaitEvent_ptsz 2025-05-07T19:46:54.5999714Z #define __cudaGet_blockDim() blockDim 2025-05-07T19:46:54.5999806Z #define __cudaGet_blockIdx() blockIdx 2025-05-07T19:46:54.5999896Z #define __cudaGet_gridDim() gridDim 2025-05-07T19:46:54.5999992Z #define __cudaGet_threadIdx() threadIdx 2025-05-07T19:46:54.6000094Z #define __cudaGet_warpSize() warpSize 2025-05-07T19:46:54.6000230Z #define __cudart_builtin__ __location__(cudart_builtin) 2025-05-07T19:46:54.6000315Z #define __daddr_t_defined 2025-05-07T19:46:54.6000909Z #define __dev_t_defined 2025-05-07T19:46:54.6001005Z #define __device__ __location__(device) 2025-05-07T19:46:54.6001140Z #define __device_builtin__ __location__(device_builtin) 2025-05-07T19:46:54.6001365Z #define __device_builtin_surface_type__ __location__(device_builtin_surface_type) 2025-05-07T19:46:54.6001600Z #define __device_builtin_texture_type__ __location__(device_builtin_texture_type) 2025-05-07T19:46:54.6001735Z #define __errordecl(name,msg) extern void name (void) 2025-05-07T19:46:54.6001866Z #define __exctype(name) extern int name (int) __THROW 2025-05-07T19:46:54.6002057Z #define __exctype_l(name) extern int name (int, __locale_t) __THROW 2025-05-07T19:46:54.6002165Z #define __export__ 2025-05-07T19:46:54.6002412Z #define __extern_always_inline extern __always_inline __attribute__ ((__gnu_inline__)) 2025-05-07T19:46:54.6002645Z #define __extern_inline extern __inline __attribute__ ((__gnu_inline__)) 2025-05-07T19:46:54.6002746Z #define __flexarr [] 2025-05-07T19:46:54.6002936Z #define __forceinline__ __inline__ __attribute__((always_inline)) 2025-05-07T19:46:54.6003186Z #define __fortify_function __extern_always_inline __attribute_artificial__ 2025-05-07T19:46:54.6003296Z #define __fsblkcnt_t_defined 2025-05-07T19:46:54.6003404Z #define __fsfilcnt_t_defined 2025-05-07T19:46:54.6003502Z #define __gid_t_defined 2025-05-07T19:46:54.6003683Z #define __glibc_likely(cond) __builtin_expect((cond), 1) 2025-05-07T19:46:54.6003848Z #define __glibc_unlikely(cond) __builtin_expect((cond), 0) 2025-05-07T19:46:54.6004091Z #define __glibcxx_assert(cond) do { __glibcxx_constexpr_assert(cond); } while (false) 2025-05-07T19:46:54.6004235Z #define __glibcxx_class_requires(_a,_b) 2025-05-07T19:46:54.6004361Z #define __glibcxx_class_requires2(_a,_b,_c) 2025-05-07T19:46:54.6004503Z #define __glibcxx_class_requires3(_a,_b,_c,_d) 2025-05-07T19:46:54.6004641Z #define __glibcxx_class_requires4(_a,_b,_c,_d,_e) 2025-05-07T19:46:54.6005140Z #define __glibcxx_constexpr_assert(cond) if (__builtin_is_constant_evaluated() && !bool(cond)) __builtin_unreachable() 2025-05-07T19:46:54.6005355Z #define __glibcxx_digits10_b(T,B) (__glibcxx_digits_b (T,B) * 643L / 2136) 2025-05-07T19:46:54.6005534Z #define __glibcxx_digits_b(T,B) (B - __glibcxx_signed_b (T,B)) 2025-05-07T19:46:54.6005682Z #define __glibcxx_function_requires(...) 2025-05-07T19:46:54.6005799Z #define __glibcxx_integral_traps true 2025-05-07T19:46:54.6006117Z #define __glibcxx_max_b(T,B) (__glibcxx_signed_b (T,B) ? (((((T)1 << (__glibcxx_digits_b (T,B) - 1)) - 1) << 1) + 1) : ~(T)0) 2025-05-07T19:46:54.6006402Z #define __glibcxx_min_b(T,B) (__glibcxx_signed_b (T,B) ? -__glibcxx_max_b (T,B) - 1 : (T)0) 2025-05-07T19:46:54.6006769Z #define __glibcxx_requires_can_decrement_range(_First1,_Last1,_First2) 2025-05-07T19:46:54.6006933Z #define __glibcxx_requires_can_increment(_First,_Size) 2025-05-07T19:46:54.6007183Z #define __glibcxx_requires_can_increment_range(_First1,_Last1,_First2) 2025-05-07T19:46:54.6007309Z #define __glibcxx_requires_cond(_Cond,_Msg) 2025-05-07T19:46:54.6007448Z #define __glibcxx_requires_heap(_First,_Last) 2025-05-07T19:46:54.6007624Z #define __glibcxx_requires_heap_pred(_First,_Last,_Pred) 2025-05-07T19:46:54.6007783Z #define __glibcxx_requires_irreflexive(_First,_Last) 2025-05-07T19:46:54.6007928Z #define __glibcxx_requires_irreflexive2(_First,_Last) 2025-05-07T19:46:54.6008121Z #define __glibcxx_requires_irreflexive_pred(_First,_Last,_Pred) 2025-05-07T19:46:54.6008336Z #define __glibcxx_requires_irreflexive_pred2(_First,_Last,_Pred) 2025-05-07T19:46:54.6008497Z #define __glibcxx_requires_non_empty_range(_First,_Last) 2025-05-07T19:46:54.6008601Z #define __glibcxx_requires_nonempty() 2025-05-07T19:46:54.6008818Z #define __glibcxx_requires_partitioned_lower(_First,_Last,_Value) 2025-05-07T19:46:54.6009060Z #define __glibcxx_requires_partitioned_lower_pred(_First,_Last,_Value,_Pred) 2025-05-07T19:46:54.6009259Z #define __glibcxx_requires_partitioned_upper(_First,_Last,_Value) 2025-05-07T19:46:54.6009489Z #define __glibcxx_requires_partitioned_upper_pred(_First,_Last,_Value,_Pred) 2025-05-07T19:46:54.6009706Z #define __glibcxx_requires_sorted(_First,_Last) 2025-05-07T19:46:54.6009862Z #define __glibcxx_requires_sorted_pred(_First,_Last,_Pred) 2025-05-07T19:46:54.6010043Z #define __glibcxx_requires_sorted_set(_First1,_Last1,_First2) 2025-05-07T19:46:54.6010296Z #define __glibcxx_requires_sorted_set_pred(_First1,_Last1,_First2,_Pred) 2025-05-07T19:46:54.6010419Z #define __glibcxx_requires_string(_String) 2025-05-07T19:46:54.6010574Z #define __glibcxx_requires_string_len(_String,_Len) 2025-05-07T19:46:54.6010726Z #define __glibcxx_requires_subscript(_N) 2025-05-07T19:46:54.6010872Z #define __glibcxx_requires_valid_range(_First,_Last) 2025-05-07T19:46:54.6011004Z #define __glibcxx_signed_b(T,B) ((T)(-1) < 0) 2025-05-07T19:46:54.6011119Z #define __global__ __location__(global) 2025-05-07T19:46:54.6011243Z #define __gnu_linux__ 1 2025-05-07T19:46:54.6011389Z #define __grid_constant__ __location__(grid_constant) 2025-05-07T19:46:54.6011500Z #define __have_pthread_attr_t 1 2025-05-07T19:46:54.6011640Z #define __host__ __location__(host) 2025-05-07T19:46:54.6011733Z #define __id_t_defined 2025-05-07T19:46:54.6011825Z #define __import__ 2025-05-07T19:46:54.6011977Z #define __inline_hint__ __attribute__((nv_inline_hint)) 2025-05-07T19:46:54.6012110Z #define __ino64_t_defined 2025-05-07T19:46:54.6012208Z #define __ino_t_defined 2025-05-07T19:46:54.6012307Z #define __int8_t_defined 2025-05-07T19:46:54.6012552Z #define __intN_t(N,MODE) typedef int int##N##_t __attribute__ ((__mode__ (MODE))) 2025-05-07T19:46:54.6012699Z #define __isalnum_l(c,l) __isctype_l((c), _ISalnum, (l)) 2025-05-07T19:46:54.6012842Z #define __isalpha_l(c,l) __isctype_l((c), _ISalpha, (l)) 2025-05-07T19:46:54.6012959Z #define __isascii(c) (((c) & ~0x7f) == 0) 2025-05-07T19:46:54.6013103Z #define __isascii_l(c,l) ((l), __isascii (c)) 2025-05-07T19:46:54.6013253Z #define __isblank_l(c,l) __isctype_l((c), _ISblank, (l)) 2025-05-07T19:46:54.6013405Z #define __iscntrl_l(c,l) __isctype_l((c), _IScntrl, (l)) 2025-05-07T19:46:54.6013731Z #define __isctype_l(c,type,locale) ((locale)->__ctype_b[(int) (c)] & (unsigned short int) type) 2025-05-07T19:46:54.6013871Z #define __isdigit_l(c,l) __isctype_l((c), _ISdigit, (l)) 2025-05-07T19:46:54.6014028Z #define __isgraph_l(c,l) __isctype_l((c), _ISgraph, (l)) 2025-05-07T19:46:54.6014252Z #define __isleap(year) ((year) % 4 == 0 && ((year) % 100 != 0 || (year) % 400 == 0)) 2025-05-07T19:46:54.6014404Z #define __islower_l(c,l) __isctype_l((c), _ISlower, (l)) 2025-05-07T19:46:54.6014544Z #define __isprint_l(c,l) __isctype_l((c), _ISprint, (l)) 2025-05-07T19:46:54.6014718Z #define __ispunct_l(c,l) __isctype_l((c), _ISpunct, (l)) 2025-05-07T19:46:54.6014863Z #define __isspace_l(c,l) __isctype_l((c), _ISspace, (l)) 2025-05-07T19:46:54.6015020Z #define __isupper_l(c,l) __isctype_l((c), _ISupper, (l)) 2025-05-07T19:46:54.6015171Z #define __isxdigit_l(c,l) __isctype_l((c), _ISxdigit, (l)) 2025-05-07T19:46:54.6015289Z #define __k8 1 2025-05-07T19:46:54.6015373Z #define __k8__ 1 2025-05-07T19:46:54.6015475Z #define __key_t_defined 2025-05-07T19:46:54.6015695Z #define __launch_bounds__(...) __annotate__(launch_bounds(__VA_ARGS__)) 2025-05-07T19:46:54.6015798Z #define __ldiv_t_defined 1 2025-05-07T19:46:54.6015877Z #define __linux 1 2025-05-07T19:46:54.6015976Z #define __linux__ 1 2025-05-07T19:46:54.6016087Z #define __lldiv_t_defined 1 2025-05-07T19:46:54.6016175Z #define __llvm__ 1 2025-05-07T19:46:54.6016285Z #define __location__(a) __annotate__(a) 2025-05-07T19:46:54.6016409Z #define __long_double_t long double 2025-05-07T19:46:54.6016520Z #define __malloc_and_calloc_defined 2025-05-07T19:46:54.6016640Z #define __managed__ __location__(managed) 2025-05-07T19:46:54.6016764Z #define __maxnreg__(a) __attribute__((maxnreg(a))) 2025-05-07T19:46:54.6016869Z #define __mode_t_defined 2025-05-07T19:46:54.6016968Z #define __need_IOV_MAX 2025-05-07T19:46:54.6017068Z #define __need_clockid_t 2025-05-07T19:46:54.6017256Z #define __nlink_t_defined 2025-05-07T19:46:54.6017448Z #define __no_return__ __attribute__((noreturn)) 2025-05-07T19:46:54.6017750Z #define __noinline__ __attribute__((noinline)) 2025-05-07T19:46:54.6017942Z #define __nonnull(params) __attribute__ ((__nonnull__ params)) 2025-05-07T19:46:54.6018074Z #define __nv_pure__ __location__(nv_pure) 2025-05-07T19:46:54.6018186Z #define __off64_t_defined 2025-05-07T19:46:54.6018274Z #define __off_t_defined 2025-05-07T19:46:54.6018400Z #define __pic__ 2 2025-05-07T19:46:54.6018500Z #define __pid_t_defined 2025-05-07T19:46:54.6018586Z #define __pie__ 2 2025-05-07T19:46:54.6018698Z #define __private_extern__ extern 2025-05-07T19:46:54.6018805Z #define __ptr_t void * 2025-05-07T19:46:54.6018902Z #define __ptrvalue 2025-05-07T19:46:54.6018988Z #define __restrict_arr 2025-05-07T19:46:54.6019154Z #define __seg_fs __attribute__((address_space(257))) 2025-05-07T19:46:54.6019303Z #define __seg_gs __attribute__((address_space(256))) 2025-05-07T19:46:54.6019421Z #define __shared__ __location__(shared) 2025-05-07T19:46:54.6019528Z #define __sigset_t_defined 2025-05-07T19:46:54.6019672Z #define __specialization_static 2025-05-07T19:46:54.6019777Z #define __ssize_t_defined 2025-05-07T19:46:54.6019871Z #define __stub_bdflush 2025-05-07T19:46:54.6019995Z #define __stub_chflags 2025-05-07T19:46:54.6020092Z #define __stub_fattach 2025-05-07T19:46:54.6020192Z #define __stub_fchflags 2025-05-07T19:46:54.6020282Z #define __stub_fdetach 2025-05-07T19:46:54.6020404Z #define __stub_getmsg 2025-05-07T19:46:54.6020502Z #define __stub_gtty 2025-05-07T19:46:54.6020605Z #define __stub_lchmod 2025-05-07T19:46:54.6020725Z #define __stub_putmsg 2025-05-07T19:46:54.6020819Z #define __stub_revoke 2025-05-07T19:46:54.6020918Z #define __stub_setlogin 2025-05-07T19:46:54.6021012Z #define __stub_sigreturn 2025-05-07T19:46:54.6021123Z #define __stub_sstk 2025-05-07T19:46:54.6021221Z #define __stub_stty 2025-05-07T19:46:54.6021329Z #define __suseconds_t_defined 2025-05-07T19:46:54.6021451Z #define __thread__ __thread 2025-05-07T19:46:54.6021628Z #define __throw_exception_again throw 2025-05-07T19:46:54.6021733Z #define __time_t_defined 1 2025-05-07T19:46:54.6021834Z #define __timer_t_defined 1 2025-05-07T19:46:54.6021966Z #define __timespec_defined 1 2025-05-07T19:46:54.6022076Z #define __toascii(c) ((c) & 0x7f) 2025-05-07T19:46:54.6022204Z #define __toascii_l(c,l) ((l), __toascii (c)) 2025-05-07T19:46:54.6022823Z #define __tobody(c,f,a,args) (__extension__ ({ int __res; if (sizeof (c) > 1) { if (__builtin_constant_p (c)) { int __c = (c); __res = __c < -128 || __c > 255 ? __c : (a)[__c]; } else __res = f args; } else __res = (a)[(int) (c)]; __res; })) 2025-05-07T19:46:54.6022920Z #define __try try 2025-05-07T19:46:54.6023017Z #define __tune_k8__ 1 2025-05-07T19:46:54.6023148Z #define __u_char_defined 2025-05-07T19:46:54.6023436Z #define __u_intN_t(N,MODE) typedef unsigned int u_int##N##_t __attribute__ ((__mode__ (MODE))) 2025-05-07T19:46:54.6023541Z #define __uid_t_defined 2025-05-07T19:46:54.6023639Z #define __unbounded 2025-05-07T19:46:54.6023753Z #define __unix 1 2025-05-07T19:46:54.6023850Z #define __unix__ 1 2025-05-07T19:46:54.6023955Z #define __useconds_t_defined 2025-05-07T19:46:54.6024077Z #define __warnattr(msg) 2025-05-07T19:46:54.6024226Z #define __warndecl(name,msg) extern void name (void) 2025-05-07T19:46:54.6024320Z #define __wur 2025-05-07T19:46:54.6024414Z #define __x86_64 1 2025-05-07T19:46:54.6024531Z #define __x86_64__ 1 2025-05-07T19:46:54.6024715Z #define _tolower(c) ((int) (*__ctype_tolower_loc ())[(int) (c)]) 2025-05-07T19:46:54.6024896Z #define _toupper(c) ((int) (*__ctype_toupper_loc ())[(int) (c)]) 2025-05-07T19:46:54.6025047Z #define alloca(size) __builtin_alloca (size) 2025-05-07T19:46:54.6025420Z #define assert(expr) ((expr) ? __ASSERT_VOID_CAST (0) : __assert_fail (__STRING(expr), __FILE__, __LINE__, __ASSERT_FUNCTION)) 2025-05-07T19:46:54.6025853Z #define assert_perror(errnum) (!(errnum) ? __ASSERT_VOID_CAST (0) : __assert_perror_fail ((errnum), __FILE__, __LINE__, __ASSERT_FUNCTION)) 2025-05-07T19:46:54.6026047Z #define be16toh(x) __bswap_16 (x) 2025-05-07T19:46:54.6026153Z #define be32toh(x) __bswap_32 (x) 2025-05-07T19:46:54.6026258Z #define be64toh(x) __bswap_64 (x) 2025-05-07T19:46:54.6026381Z #define cudaArrayColorAttachment 0x20 2025-05-07T19:46:54.6026521Z #define cudaArrayCubemap 0x04 2025-05-07T19:46:54.6026624Z #define cudaArrayDefault 0x00 2025-05-07T19:46:54.6026742Z #define cudaArrayDeferredMapping 0x80 2025-05-07T19:46:54.6026879Z #define cudaArrayLayered 0x01 2025-05-07T19:46:54.6026991Z #define cudaArraySparse 0x40 2025-05-07T19:46:54.6027158Z #define cudaArraySparsePropertiesSingleMipTail 0x1 2025-05-07T19:46:54.6027281Z #define cudaArraySurfaceLoadStore 0x02 2025-05-07T19:46:54.6027429Z #define cudaArrayTextureGather 0x08 2025-05-07T19:46:54.6027622Z #define cudaCooperativeLaunchMultiDeviceNoPostSync 0x02 2025-05-07T19:46:54.6027812Z #define cudaCooperativeLaunchMultiDeviceNoPreSync 0x01 2025-05-07T19:46:54.6027954Z #define cudaCpuDeviceId ((int)-1) 2025-05-07T19:46:54.6028070Z #define cudaDeviceBlockingSync 0x04 2025-05-07T19:46:54.6028197Z #define cudaDeviceLmemResizeToMax 0x10 2025-05-07T19:46:54.6028307Z #define cudaDeviceMapHost 0x08 2025-05-07T19:46:54.6028445Z #define cudaDeviceMask 0xff 2025-05-07T19:46:54.6028564Z #define cudaDeviceScheduleAuto 0x00 2025-05-07T19:46:54.6028706Z #define cudaDeviceScheduleBlockingSync 0x04 2025-05-07T19:46:54.6028859Z #define cudaDeviceScheduleMask 0x07 2025-05-07T19:46:54.6028976Z #define cudaDeviceScheduleSpin 0x01 2025-05-07T19:46:54.6029098Z #define cudaDeviceScheduleYield 0x02 2025-05-07T19:46:54.6029223Z #define cudaDeviceSyncMemops 0x80 2025-05-07T19:46:54.6029377Z #define cudaEventBlockingSync 0x01 2025-05-07T19:46:54.6029489Z #define cudaEventDefault 0x00 2025-05-07T19:46:54.6029607Z #define cudaEventDisableTiming 0x02 2025-05-07T19:46:54.6029858Z #define cudaEventInterprocess 0x04 2025-05-07T19:46:54.6029970Z #define cudaEventRecordDefault 0x00 2025-05-07T19:46:54.6030082Z #define cudaEventRecordExternal 0x01 2025-05-07T19:46:54.6030220Z #define cudaEventWaitDefault 0x00 2025-05-07T19:46:54.6030403Z #define cudaEventWaitExternal 0x01 2025-05-07T19:46:54.6030529Z #define cudaExternalMemoryDedicated 0x1 2025-05-07T19:46:54.6030717Z #define cudaExternalSemaphoreSignalSkipNvSciBufMemSync 0x01 2025-05-07T19:46:54.6030929Z #define cudaExternalSemaphoreWaitSkipNvSciBufMemSync 0x02 2025-05-07T19:46:54.6031115Z #define cudaGetDeviceProperties cudaGetDeviceProperties_v2 2025-05-07T19:46:54.6031249Z #define cudaGraphKernelNodePortDefault 0 2025-05-07T19:46:54.6031429Z #define cudaGraphKernelNodePortLaunchCompletion 2 2025-05-07T19:46:54.6031573Z #define cudaGraphKernelNodePortProgrammatic 1 2025-05-07T19:46:54.6031692Z #define cudaHostAllocDefault 0x00 2025-05-07T19:46:54.6031804Z #define cudaHostAllocMapped 0x02 2025-05-07T19:46:54.6031945Z #define cudaHostAllocPortable 0x01 2025-05-07T19:46:54.6032068Z #define cudaHostAllocWriteCombined 0x04 2025-05-07T19:46:54.6032183Z #define cudaHostRegisterDefault 0x00 2025-05-07T19:46:54.6032331Z #define cudaHostRegisterIoMemory 0x04 2025-05-07T19:46:54.6032454Z #define cudaHostRegisterMapped 0x02 2025-05-07T19:46:54.6032573Z #define cudaHostRegisterPortable 0x01 2025-05-07T19:46:54.6032682Z #define cudaHostRegisterReadOnly 0x08 2025-05-07T19:46:54.6032834Z #define cudaInitDeviceFlagsAreValid 0x01 2025-05-07T19:46:54.6032947Z #define cudaInvalidDeviceId ((int)-2) 2025-05-07T19:46:54.6033080Z #define cudaIpcMemLazyEnablePeerAccess 0x01 2025-05-07T19:46:54.6033259Z #define cudaKernelNodeAttrID cudaLaunchAttributeID 2025-05-07T19:46:54.6033436Z #define cudaKernelNodeAttrValue cudaLaunchAttributeValue 2025-05-07T19:46:54.6033762Z #define cudaKernelNodeAttributeAccessPolicyWindow cudaLaunchAttributeAccessPolicyWindow 2025-05-07T19:46:54.6034090Z #define cudaKernelNodeAttributeClusterDimension cudaLaunchAttributeClusterDimension 2025-05-07T19:46:54.6034582Z #define cudaKernelNodeAttributeClusterSchedulingPolicyPreference cudaLaunchAttributeClusterSchedulingPolicyPreference 2025-05-07T19:46:54.6034845Z #define cudaKernelNodeAttributeCooperative cudaLaunchAttributeCooperative 2025-05-07T19:46:54.6035321Z #define cudaKernelNodeAttributeDeviceUpdatableKernelNode cudaLaunchAttributeDeviceUpdatableKernelNode 2025-05-07T19:46:54.6035588Z #define cudaKernelNodeAttributeMemSyncDomain cudaLaunchAttributeMemSyncDomain 2025-05-07T19:46:54.6035899Z #define cudaKernelNodeAttributeMemSyncDomainMap cudaLaunchAttributeMemSyncDomainMap 2025-05-07T19:46:54.6036351Z #define cudaKernelNodeAttributePreferredSharedMemoryCarveout cudaLaunchAttributePreferredSharedMemoryCarveout 2025-05-07T19:46:54.6036610Z #define cudaKernelNodeAttributePriority cudaLaunchAttributePriority 2025-05-07T19:46:54.6036722Z #define cudaMemAttachGlobal 0x01 2025-05-07T19:46:54.6036836Z #define cudaMemAttachHost 0x02 2025-05-07T19:46:54.6036979Z #define cudaMemAttachSingle 0x04 2025-05-07T19:46:54.6037122Z #define cudaMemPoolCreateUsageHwDecompress 0x2 2025-05-07T19:46:54.6037238Z #define cudaNvSciSyncAttrSignal 0x1 2025-05-07T19:46:54.6037384Z #define cudaNvSciSyncAttrWait 0x2 2025-05-07T19:46:54.6037502Z #define cudaOccupancyDefault 0x00 2025-05-07T19:46:54.6037655Z #define cudaOccupancyDisableCachingOverride 0x01 2025-05-07T19:46:54.6037773Z #define cudaPeerAccessDefault 0x00 2025-05-07T19:46:54.6038158Z #define cudaSignalExternalSemaphoresAsync __CUDART_API_PTSZ(cudaSignalExternalSemaphoresAsync_v2) 2025-05-07T19:46:54.6038296Z #define cudaStreamAttrID cudaLaunchAttributeID 2025-05-07T19:46:54.6038459Z #define cudaStreamAttrValue cudaLaunchAttributeValue 2025-05-07T19:46:54.6038791Z #define cudaStreamAttributeAccessPolicyWindow cudaLaunchAttributeAccessPolicyWindow 2025-05-07T19:46:54.6039041Z #define cudaStreamAttributeMemSyncDomain cudaLaunchAttributeMemSyncDomain 2025-05-07T19:46:54.6039324Z #define cudaStreamAttributeMemSyncDomainMap cudaLaunchAttributeMemSyncDomainMap 2025-05-07T19:46:54.6039573Z #define cudaStreamAttributePriority cudaLaunchAttributePriority 2025-05-07T19:46:54.6039906Z #define cudaStreamAttributeSynchronizationPolicy cudaLaunchAttributeSynchronizationPolicy 2025-05-07T19:46:54.6040068Z #define cudaStreamDefault 0x00 2025-05-07T19:46:54.6040218Z #define cudaStreamFireAndForget ((cudaStream_t)0x4) 2025-05-07T19:46:54.6040514Z #define cudaStreamGetCaptureInfo __CUDART_API_PTSZ(cudaStreamGetCaptureInfo_v2) 2025-05-07T19:46:54.6040737Z #define cudaStreamGraphFireAndForget (cudaStream_t)0x0200000000000000 2025-05-07T19:46:54.6040999Z #define cudaStreamGraphFireAndForgetAsSibling (cudaStream_t)0x0300000000000000 2025-05-07T19:46:54.6041233Z #define cudaStreamGraphTailLaunch (cudaStream_t)0x0100000000000000 2025-05-07T19:46:54.6041365Z #define cudaStreamLegacy ((cudaStream_t)0x1) 2025-05-07T19:46:54.6041480Z #define cudaStreamNonBlocking 0x01 2025-05-07T19:46:54.6041651Z #define cudaStreamPerThread ((cudaStream_t)0x2) 2025-05-07T19:46:54.6041793Z #define cudaStreamTailLaunch ((cudaStream_t)0x3) 2025-05-07T19:46:54.6041902Z #define cudaSurfaceType1D 0x01 2025-05-07T19:46:54.6042011Z #define cudaSurfaceType1DLayered 0xF1 2025-05-07T19:46:54.6042139Z #define cudaSurfaceType2D 0x02 2025-05-07T19:46:54.6042263Z #define cudaSurfaceType2DLayered 0xF2 2025-05-07T19:46:54.6042366Z #define cudaSurfaceType3D 0x03 2025-05-07T19:46:54.6042509Z #define cudaSurfaceTypeCubemap 0x0C 2025-05-07T19:46:54.6042635Z #define cudaSurfaceTypeCubemapLayered 0xFC 2025-05-07T19:46:54.6042739Z #define cudaTextureType1D 0x01 2025-05-07T19:46:54.6042852Z #define cudaTextureType1DLayered 0xF1 2025-05-07T19:46:54.6042981Z #define cudaTextureType2D 0x02 2025-05-07T19:46:54.6043095Z #define cudaTextureType2DLayered 0xF2 2025-05-07T19:46:54.6043200Z #define cudaTextureType3D 0x03 2025-05-07T19:46:54.6043325Z #define cudaTextureTypeCubemap 0x0C 2025-05-07T19:46:54.6043451Z #define cudaTextureTypeCubemapLayered 0xFC 2025-05-07T19:46:54.6043779Z #define cudaWaitExternalSemaphoresAsync __CUDART_API_PTSZ(cudaWaitExternalSemaphoresAsync_v2) 2025-05-07T19:46:54.6043887Z #define getc(_fp) _IO_getc (_fp) 2025-05-07T19:46:54.6044014Z #define htobe16(x) __bswap_16 (x) 2025-05-07T19:46:54.6044117Z #define htobe32(x) __bswap_32 (x) 2025-05-07T19:46:54.6044279Z #define htobe64(x) __bswap_64 (x) 2025-05-07T19:46:54.6044393Z #define htole16(x) (x) 2025-05-07T19:46:54.6044486Z #define htole32(x) (x) 2025-05-07T19:46:54.6044578Z #define htole64(x) (x) 2025-05-07T19:46:54.6044726Z #define isalnum_l(c,l) __isalnum_l ((c), (l)) 2025-05-07T19:46:54.6044844Z #define isalpha_l(c,l) __isalpha_l ((c), (l)) 2025-05-07T19:46:54.6044945Z #define isascii(c) __isascii (c) 2025-05-07T19:46:54.6045066Z #define isascii_l(c,l) __isascii_l ((c), (l)) 2025-05-07T19:46:54.6045211Z #define isblank_l(c,l) __isblank_l ((c), (l)) 2025-05-07T19:46:54.6045328Z #define iscntrl_l(c,l) __iscntrl_l ((c), (l)) 2025-05-07T19:46:54.6045447Z #define isdigit_l(c,l) __isdigit_l ((c), (l)) 2025-05-07T19:46:54.6045590Z #define isgraph_l(c,l) __isgraph_l ((c), (l)) 2025-05-07T19:46:54.6045704Z #define islower_l(c,l) __islower_l ((c), (l)) 2025-05-07T19:46:54.6045823Z #define isprint_l(c,l) __isprint_l ((c), (l)) 2025-05-07T19:46:54.6045942Z #define ispunct_l(c,l) __ispunct_l ((c), (l)) 2025-05-07T19:46:54.6046097Z #define isspace_l(c,l) __isspace_l ((c), (l)) 2025-05-07T19:46:54.6046213Z #define isupper_l(c,l) __isupper_l ((c), (l)) 2025-05-07T19:46:54.6046341Z #define isxdigit_l(c,l) __isxdigit_l ((c), (l)) 2025-05-07T19:46:54.6046464Z #define le16toh(x) (x) 2025-05-07T19:46:54.6046555Z #define le32toh(x) (x) 2025-05-07T19:46:54.6046650Z #define le64toh(x) (x) 2025-05-07T19:46:54.6046739Z #define linux 1 2025-05-07T19:46:54.6046880Z #define major(dev) gnu_dev_major (dev) 2025-05-07T19:46:54.6047009Z #define makedev(maj,min) gnu_dev_makedev (maj, min) 2025-05-07T19:46:54.6047158Z #define math_errhandling (MATH_ERRNO | MATH_ERREXCEPT) 2025-05-07T19:46:54.6047295Z #define minor(dev) gnu_dev_minor (dev) 2025-05-07T19:46:54.6047419Z #define offsetof(t,d) __builtin_offsetof(t, d) 2025-05-07T19:46:54.6047532Z #define putc(_ch,_fp) _IO_putc (_ch, _fp) 2025-05-07T19:46:54.6066335Z #define stderr stderr 2025-05-07T19:46:54.6066522Z #define stdin stdin 2025-05-07T19:46:54.6066686Z #define stdout stdout 2025-05-07T19:46:54.6067574Z #define strdupa(s) (__extension__ ({ const char *__old = (s); size_t __len = strlen (__old) + 1; char *__new = (char *) __builtin_alloca (__len); (char *) memcpy (__new, __old, __len); })) 2025-05-07T19:46:54.6068167Z #define strndupa(s,n) (__extension__ ({ const char *__old = (s); size_t __len = strnlen (__old, (n)); char *__new = (char *) __builtin_alloca (__len + 1); __new[__len] = '\0'; (char *) memcpy (__new, __old, __len); })) 2025-05-07T19:46:54.6068307Z #define toascii(c) __toascii (c) 2025-05-07T19:46:54.6068443Z #define toascii_l(c,l) __toascii_l ((c), (l)) 2025-05-07T19:46:54.6068539Z #define unix 1 2025-05-07T19:46:54.6068709Z #define w_coredump __wait_terminated.__w_coredump 2025-05-07T19:46:54.6068852Z #define w_retcode __wait_terminated.__w_retcode 2025-05-07T19:46:54.6068984Z #define w_stopsig __wait_stopped.__w_stopsig 2025-05-07T19:46:54.6069134Z #define w_stopval __wait_stopped.__w_stopval 2025-05-07T19:46:54.6069268Z #define w_termsig __wait_terminated.__w_termsig 2025-05-07T19:46:54.6069285Z 2025-05-07T19:46:54.6171320Z 2025-05-07T19:46:54.6171654Z + conda run -n build_binary nvcc --version 2025-05-07T19:46:54.6171666Z 2025-05-07T19:46:56.4709317Z nvcc: NVIDIA (R) Cuda compiler driver 2025-05-07T19:46:56.4709795Z Copyright (c) 2005-2025 NVIDIA Corporation 2025-05-07T19:46:56.4710165Z Built on Wed_Jan_15_19:20:09_PST_2025 2025-05-07T19:46:56.4710520Z Cuda compilation tools, release 12.8, V12.8.61 2025-05-07T19:46:56.4710923Z Build cuda_12.8.r12.8/compiler.35404655_0 2025-05-07T19:46:56.4711157Z 2025-05-07T19:46:56.5453190Z 2025-05-07T19:46:56.5460622Z which: no nvidia-smi in (CONDA=/github/home/miniconda:/github/home/miniconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) 2025-05-07T19:46:56.5461455Z [CHECK] nvidia-smi not found 2025-05-07T19:46:56.5461823Z [INSTALL] Successfully installed CUDA 12.8.0 2025-05-07T19:46:56.5570783Z ##[group]Run . $PRELUDE; install_pytorch_pip $BUILD_ENV nightly cuda/12.8.0 2025-05-07T19:46:56.5571611Z . $PRELUDE; install_pytorch_pip $BUILD_ENV nightly cuda/12.8.0 2025-05-07T19:46:56.5572252Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:46:56.5572636Z env: 2025-05-07T19:46:56.5572909Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:46:56.5573243Z BUILD_ENV: build_binary 2025-05-07T19:46:56.5573539Z BUILD_TARGET: genai 2025-05-07T19:46:56.5573789Z BUILD_VARIANT: cuda 2025-05-07T19:46:56.5574078Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:46:56.5574355Z ##[endgroup] 2025-05-07T19:46:56.9936571Z ################################################################################ 2025-05-07T19:46:56.9937917Z # Install PyTorch (PIP) 2025-05-07T19:46:56.9938599Z # 2025-05-07T19:46:56.9948804Z # [2025-05-07T19:46:56.994Z] + install_pytorch_pip build_binary nightly cuda/12.8.0 2025-05-07T19:46:56.9950467Z ################################################################################ 2025-05-07T19:46:56.9951169Z 2025-05-07T19:46:56.9973963Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y numpy 2025-05-07T19:46:57.9312783Z Channels: 2025-05-07T19:46:57.9313311Z - conda-forge 2025-05-07T19:46:57.9313715Z Platform: linux-64 2025-05-07T19:47:01.2025113Z Collecting package metadata (repodata.json): - \ | / - done 2025-05-07T19:47:02.9200728Z Solving environment: | / - \ done 2025-05-07T19:47:03.2373600Z 2025-05-07T19:47:03.2374167Z ## Package Plan ## 2025-05-07T19:47:03.2374640Z 2025-05-07T19:47:03.2375280Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:47:03.2376241Z 2025-05-07T19:47:03.2376525Z added / updated specs: 2025-05-07T19:47:03.2377485Z - numpy 2025-05-07T19:47:03.2377833Z 2025-05-07T19:47:03.2377847Z 2025-05-07T19:47:03.2378198Z The following packages will be downloaded: 2025-05-07T19:47:03.2378890Z 2025-05-07T19:47:03.2379274Z package | build 2025-05-07T19:47:03.2380247Z ---------------------------|----------------- 2025-05-07T19:47:03.2380687Z libblas-3.9.0 |31_h59b9bed_openblas 16 KB conda-forge 2025-05-07T19:47:03.2381223Z libcblas-3.9.0 |31_he106b2a_openblas 16 KB conda-forge 2025-05-07T19:47:03.2381732Z liblapack-3.9.0 |31_h7ac8fdf_openblas 16 KB conda-forge 2025-05-07T19:47:03.2382240Z numpy-2.0.2 | py39h9cb892a_1 7.6 MB conda-forge 2025-05-07T19:47:03.2382667Z ------------------------------------------------------------ 2025-05-07T19:47:03.2383071Z Total: 7.6 MB 2025-05-07T19:47:03.2383305Z 2025-05-07T19:47:03.2383476Z The following NEW packages will be INSTALLED: 2025-05-07T19:47:03.2383720Z 2025-05-07T19:47:03.2383974Z libblas conda-forge/linux-64::libblas-3.9.0-31_h59b9bed_openblas 2025-05-07T19:47:03.2384698Z libcblas conda-forge/linux-64::libcblas-3.9.0-31_he106b2a_openblas 2025-05-07T19:47:03.2385262Z liblapack conda-forge/linux-64::liblapack-3.9.0-31_h7ac8fdf_openblas 2025-05-07T19:47:03.2385817Z numpy conda-forge/linux-64::numpy-2.0.2-py39h9cb892a_1 2025-05-07T19:47:03.2386102Z 2025-05-07T19:47:03.2386106Z 2025-05-07T19:47:03.2386110Z 2025-05-07T19:47:03.2386296Z Downloading and Extracting Packages: ...working... 2025-05-07T19:47:03.2386693Z numpy-2.0.2 | 7.6 MB | | 0% 2025-05-07T19:47:03.2386959Z 2025-05-07T19:47:03.2387301Z libblas-3.9.0 | 16 KB | | 0%  2025-05-07T19:47:03.2387554Z 2025-05-07T19:47:03.2387558Z 2025-05-07T19:47:03.2387806Z libcblas-3.9.0 | 16 KB | | 0%  2025-05-07T19:47:03.2388072Z 2025-05-07T19:47:03.2388077Z 2025-05-07T19:47:03.2388081Z 2025-05-07T19:47:03.5871543Z liblapack-3.9.0 | 16 KB | | 0%  2025-05-07T19:47:03.5911394Z numpy-2.0.2 | 7.6 MB | | 0% 2025-05-07T19:47:03.5912247Z 2025-05-07T19:47:03.5912261Z 2025-05-07T19:47:03.5912549Z 2025-05-07T19:47:03.5913457Z liblapack-3.9.0 | 16 KB | #########7 | 98%  2025-05-07T19:47:03.5914267Z 2025-05-07T19:47:03.5914278Z 2025-05-07T19:47:03.5914299Z 2025-05-07T19:47:03.6116874Z liblapack-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:47:03.6117772Z 2025-05-07T19:47:03.6117786Z 2025-05-07T19:47:03.6118228Z libcblas-3.9.0 | 16 KB | #########7 | 98%  2025-05-07T19:47:03.6118533Z 2025-05-07T19:47:03.6118537Z 2025-05-07T19:47:03.6152695Z libcblas-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:47:03.6153541Z 2025-05-07T19:47:03.6153554Z 2025-05-07T19:47:03.6153565Z 2025-05-07T19:47:03.6203337Z liblapack-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:47:03.6204205Z 2025-05-07T19:47:03.6205365Z libblas-3.9.0 | 16 KB | #########7 | 97%  2025-05-07T19:47:03.6206188Z 2025-05-07T19:47:03.6437789Z libblas-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:47:03.6438638Z 2025-05-07T19:47:03.6438652Z 2025-05-07T19:47:03.6460174Z libcblas-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:47:03.6461593Z 2025-05-07T19:47:03.6882582Z libblas-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:47:03.7258920Z numpy-2.0.2 | 7.6 MB | #######5 | 75% 2025-05-07T19:47:04.0768040Z numpy-2.0.2 | 7.6 MB | ########## | 100% 2025-05-07T19:47:04.0769039Z numpy-2.0.2 | 7.6 MB | ########## | 100% 2025-05-07T19:47:04.0769630Z 2025-05-07T19:47:04.0770043Z 2025-05-07T19:47:04.0770514Z  2025-05-07T19:47:04.0770742Z 2025-05-07T19:47:04.0770746Z 2025-05-07T19:47:04.0770927Z  2025-05-07T19:47:04.0771187Z 2025-05-07T19:47:04.0771191Z 2025-05-07T19:47:04.0771214Z 2025-05-07T19:47:04.0771416Z  done 2025-05-07T19:47:04.1780181Z Preparing transaction: / done 2025-05-07T19:47:04.3801336Z Verifying transaction: \ | done 2025-05-07T19:47:04.4811674Z Executing transaction: - done 2025-05-07T19:47:04.5869241Z ################################################################################ 2025-05-07T19:47:04.5870482Z # Install Package From PyTorch PIP: torch 2025-05-07T19:47:04.5870845Z # 2025-05-07T19:47:04.5889539Z # [2025-05-07T19:47:04.588Z] + install_from_pytorch_pip build_binary torch nightly cuda/12.8.0 2025-05-07T19:47:04.5890425Z ################################################################################ 2025-05-07T19:47:04.5890668Z 2025-05-07T19:47:04.5905449Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:47:04.6800459Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:47:04.6801022Z ################################################################################ 2025-05-07T19:47:04.6801405Z # Prepare PIP Arguments (PyTorch PIP) 2025-05-07T19:47:04.6801746Z # 2025-05-07T19:47:04.6816270Z # [2025-05-07T19:47:04.681Z] + __prepare_pip_arguments torch nightly cuda/12.8.0 2025-05-07T19:47:04.6816859Z ################################################################################ 2025-05-07T19:47:04.6817213Z 2025-05-07T19:47:04.6850372Z [INSTALL] Extracted package (channel, version): (nightly, LATEST) 2025-05-07T19:47:04.6873164Z [INSTALL] Extracted package variant: cu128 2025-05-07T19:47:04.6892261Z [INSTALL] Using a non-RELEASE channel: nightly ... 2025-05-07T19:47:04.6893983Z [INSTALL] Extracted the full PIP channel: https://download.pytorch.org/whl/nightly/cu128/ 2025-05-07T19:47:04.6898181Z [INSTALL] Extracted the full PIP package: --pre torch 2025-05-07T19:47:04.6906195Z [INSTALL] Attempting to install [torch, LATEST] from PyTorch PIP using channel https://download.pytorch.org/whl/nightly/cu128/ ... 2025-05-07T19:47:04.6935388Z [EXEC] [ATTEMPT 0/3] + conda run -n build_binary pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128/ 2025-05-07T19:48:53.3347679Z WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-05-07T19:48:53.3349342Z 2025-05-07T19:48:53.3349602Z Looking in indexes: https://download.pytorch.org/whl/nightly/cu128/ 2025-05-07T19:48:53.3350118Z Collecting torch 2025-05-07T19:48:53.3350902Z Downloading https://download.pytorch.org/whl/nightly/cu128/torch-2.8.0.dev20250507%2Bcu128-cp39-cp39-manylinux_2_28_x86_64.whl.metadata (30 kB) 2025-05-07T19:48:53.3351646Z Collecting filelock (from torch) 2025-05-07T19:48:53.3352186Z Downloading https://download.pytorch.org/whl/nightly/filelock-3.16.1-py3-none-any.whl (16 kB) 2025-05-07T19:48:53.3353136Z Requirement already satisfied: typing-extensions>=4.10.0 in /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages (from torch) (4.13.2) 2025-05-07T19:48:53.3353881Z Collecting sympy>=1.13.3 (from torch) 2025-05-07T19:48:53.3354390Z Downloading https://download.pytorch.org/whl/nightly/sympy-1.13.3-py3-none-any.whl (6.2 MB) 2025-05-07T19:48:53.3355358Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.2/6.2 MB 32.7 MB/s eta 0:00:00 2025-05-07T19:48:53.3355702Z Collecting networkx (from torch) 2025-05-07T19:48:53.3356211Z Downloading https://download.pytorch.org/whl/nightly/networkx-3.2.1-py3-none-any.whl (1.6 MB) 2025-05-07T19:48:53.3356865Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 10.8 MB/s eta 0:00:00 2025-05-07T19:48:53.3357569Z Requirement already satisfied: jinja2 in /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages (from torch) (3.1.6) 2025-05-07T19:48:53.3358260Z Collecting fsspec (from torch) 2025-05-07T19:48:53.3358751Z Downloading https://download.pytorch.org/whl/nightly/fsspec-2024.10.0-py3-none-any.whl (179 kB) 2025-05-07T19:48:53.3359360Z Collecting nvidia-cuda-nvrtc-cu12==12.8.61 (from torch) 2025-05-07T19:48:53.3360197Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_nvrtc_cu12-12.8.61-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB) 2025-05-07T19:48:53.3361056Z Collecting nvidia-cuda-runtime-cu12==12.8.57 (from torch) 2025-05-07T19:48:53.3361896Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_runtime_cu12-12.8.57-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) 2025-05-07T19:48:53.3362753Z Collecting nvidia-cuda-cupti-cu12==12.8.57 (from torch) 2025-05-07T19:48:53.3363606Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_cupti_cu12-12.8.57-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) 2025-05-07T19:48:53.3364447Z Collecting nvidia-cudnn-cu12==9.8.0.87 (from torch) 2025-05-07T19:48:53.3365152Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cudnn_cu12-9.8.0.87-py3-none-manylinux_2_27_x86_64.whl.metadata (1.8 kB) 2025-05-07T19:48:53.3365894Z Collecting nvidia-cublas-cu12==12.8.3.14 (from torch) 2025-05-07T19:48:53.3366791Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cublas_cu12-12.8.3.14-py3-none-manylinux_2_27_x86_64.whl.metadata (1.7 kB) 2025-05-07T19:48:53.3367826Z Collecting nvidia-cufft-cu12==11.3.3.41 (from torch) 2025-05-07T19:48:53.3368859Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cufft_cu12-11.3.3.41-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.5 kB) 2025-05-07T19:48:53.3369751Z Collecting nvidia-curand-cu12==10.3.9.55 (from torch) 2025-05-07T19:48:53.3370539Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_curand_cu12-10.3.9.55-py3-none-manylinux_2_27_x86_64.whl.metadata (1.5 kB) 2025-05-07T19:48:53.3371720Z Collecting nvidia-cusolver-cu12==11.7.2.55 (from torch) 2025-05-07T19:48:53.3372561Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusolver_cu12-11.7.2.55-py3-none-manylinux_2_27_x86_64.whl.metadata (1.6 kB) 2025-05-07T19:48:53.3373582Z Collecting nvidia-cusparse-cu12==12.5.7.53 (from torch) 2025-05-07T19:48:53.3374569Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusparse_cu12-12.5.7.53-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.6 kB) 2025-05-07T19:48:53.3375419Z Collecting nvidia-cusparselt-cu12==0.6.3 (from torch) 2025-05-07T19:48:53.3376155Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusparselt_cu12-0.6.3-py3-none-manylinux2014_x86_64.whl.metadata (6.8 kB) 2025-05-07T19:48:53.3376978Z Collecting nvidia-nccl-cu12==2.26.2 (from torch) 2025-05-07T19:48:53.3378008Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nccl_cu12-2.26.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.0 kB) 2025-05-07T19:48:53.3378875Z Collecting nvidia-nvtx-cu12==12.8.55 (from torch) 2025-05-07T19:48:53.3379751Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nvtx_cu12-12.8.55-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.6 kB) 2025-05-07T19:48:53.3380623Z Collecting nvidia-nvjitlink-cu12==12.8.61 (from torch) 2025-05-07T19:48:53.3381537Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nvjitlink_cu12-12.8.61-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB) 2025-05-07T19:48:53.3382416Z Collecting nvidia-cufile-cu12==1.13.0.11 (from torch) 2025-05-07T19:48:53.3383297Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cufile_cu12-1.13.0.11-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.5 kB) 2025-05-07T19:48:53.3384283Z Collecting pytorch-triton==3.3.0+git96316ce5 (from torch) 2025-05-07T19:48:53.3385121Z Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-3.3.0%2Bgit96316ce5-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (1.6 kB) 2025-05-07T19:48:53.3386436Z Requirement already satisfied: setuptools>=40.8.0 in /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages (from pytorch-triton==3.3.0+git96316ce5->torch) (78.1.1) 2025-05-07T19:48:53.3387336Z Collecting mpmath<1.4,>=1.1.0 (from sympy>=1.13.3->torch) 2025-05-07T19:48:53.3387892Z Downloading https://download.pytorch.org/whl/nightly/mpmath-1.3.0-py3-none-any.whl (536 kB) 2025-05-07T19:48:53.3388629Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 kB 4.5 MB/s eta 0:00:00 2025-05-07T19:48:53.3389381Z Requirement already satisfied: MarkupSafe>=2.0 in /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages (from jinja2->torch) (3.0.2) 2025-05-07T19:48:53.3390499Z Downloading https://download.pytorch.org/whl/nightly/cu128/torch-2.8.0.dev20250507%2Bcu128-cp39-cp39-manylinux_2_28_x86_64.whl (1047.1 MB) 2025-05-07T19:48:53.3391359Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 GB 27.8 MB/s eta 0:00:00 2025-05-07T19:48:53.3392066Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cublas_cu12-12.8.3.14-py3-none-manylinux_2_27_x86_64.whl (609.6 MB) 2025-05-07T19:48:53.3392889Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 609.6/609.6 MB 40.1 MB/s eta 0:00:00 2025-05-07T19:48:53.3393683Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_cupti_cu12-12.8.57-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (10.2 MB) 2025-05-07T19:48:53.3394566Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.2/10.2 MB 40.4 MB/s eta 0:00:00 2025-05-07T19:48:53.3395369Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_nvrtc_cu12-12.8.61-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (88.0 MB) 2025-05-07T19:48:53.3396460Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88.0/88.0 MB 75.9 MB/s eta 0:00:00 2025-05-07T19:48:53.3397416Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_runtime_cu12-12.8.57-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (954 kB) 2025-05-07T19:48:53.3398340Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 954.8/954.8 kB 5.2 MB/s eta 0:00:00 2025-05-07T19:48:53.3399160Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cudnn_cu12-9.8.0.87-py3-none-manylinux_2_27_x86_64.whl (698.0 MB) 2025-05-07T19:48:53.3399987Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 698.0/698.0 MB 35.4 MB/s eta 0:00:00 2025-05-07T19:48:53.3400799Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cufft_cu12-11.3.3.41-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (193.1 MB) 2025-05-07T19:48:53.3401719Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 193.1/193.1 MB 85.3 MB/s eta 0:00:00 2025-05-07T19:48:53.3402622Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cufile_cu12-1.13.0.11-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.2 MB) 2025-05-07T19:48:53.3403500Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 6.8 MB/s eta 0:00:00 2025-05-07T19:48:53.3404209Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_curand_cu12-10.3.9.55-py3-none-manylinux_2_27_x86_64.whl (63.6 MB) 2025-05-07T19:48:53.3405000Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.6/63.6 MB 66.1 MB/s eta 0:00:00 2025-05-07T19:48:53.3405723Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusolver_cu12-11.7.2.55-py3-none-manylinux_2_27_x86_64.whl (260.4 MB) 2025-05-07T19:48:53.3406514Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 260.4/260.4 MB 80.6 MB/s eta 0:00:00 2025-05-07T19:48:53.3407320Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusparse_cu12-12.5.7.53-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (292.1 MB) 2025-05-07T19:48:53.3408214Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 292.1/292.1 MB 72.5 MB/s eta 0:00:00 2025-05-07T19:48:53.3408924Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusparselt_cu12-0.6.3-py3-none-manylinux2014_x86_64.whl (156.8 MB) 2025-05-07T19:48:53.3409734Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 156.8/156.8 MB 75.4 MB/s eta 0:00:00 2025-05-07T19:48:53.3410506Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nccl_cu12-2.26.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (201.3 MB) 2025-05-07T19:48:53.3411375Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 201.3/201.3 MB 74.4 MB/s eta 0:00:00 2025-05-07T19:48:53.3412180Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nvjitlink_cu12-12.8.61-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.2 MB) 2025-05-07T19:48:53.3413050Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.2/39.2 MB 68.9 MB/s eta 0:00:00 2025-05-07T19:48:53.3413821Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nvtx_cu12-12.8.55-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89 kB) 2025-05-07T19:48:53.3415058Z Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-3.3.0%2Bgit96316ce5-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (153.4 MB) 2025-05-07T19:48:53.3415961Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 153.4/153.4 MB 81.2 MB/s eta 0:00:00 2025-05-07T19:48:53.3418122Z Installing collected packages: nvidia-cusparselt-cu12, mpmath, sympy, pytorch-triton, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufile-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, networkx, fsspec, filelock, nvidia-cusparse-cu12, nvidia-cufft-cu12, nvidia-cudnn-cu12, nvidia-cusolver-cu12, torch 2025-05-07T19:48:53.3419834Z 2025-05-07T19:48:53.3421982Z Successfully installed filelock-3.16.1 fsspec-2024.10.0 mpmath-1.3.0 networkx-3.2.1 nvidia-cublas-cu12-12.8.3.14 nvidia-cuda-cupti-cu12-12.8.57 nvidia-cuda-nvrtc-cu12-12.8.61 nvidia-cuda-runtime-cu12-12.8.57 nvidia-cudnn-cu12-9.8.0.87 nvidia-cufft-cu12-11.3.3.41 nvidia-cufile-cu12-1.13.0.11 nvidia-curand-cu12-10.3.9.55 nvidia-cusolver-cu12-11.7.2.55 nvidia-cusparse-cu12-12.5.7.53 nvidia-cusparselt-cu12-0.6.3 nvidia-nccl-cu12-2.26.2 nvidia-nvjitlink-cu12-12.8.61 nvidia-nvtx-cu12-12.8.55 pytorch-triton-3.3.0+git96316ce5 sympy-1.13.3 torch-2.8.0.dev20250507+cu128 2025-05-07T19:48:53.3424265Z 2025-05-07T19:48:55.4701348Z torch 2.8.0.dev20250507+cu128 2025-05-07T19:48:55.4703792Z [CHECK] The installed package [torch, nightly/LATEST] is the correct variant (cu128) 2025-05-07T19:48:58.7989125Z [CHECK] Python (sub-)package 'torch.distributed' found ... 2025-05-07T19:49:02.1291119Z [CHECK] NOTE: The installed version is: 2.8.0.dev20250507+cu128 2025-05-07T19:49:02.1291613Z [CHECK] NOTE: Checking _GLIBCXX_USE_CXX11_ABI ... 2025-05-07T19:49:05.4803083Z True 2025-05-07T19:49:05.4803790Z True 2025-05-07T19:49:05.4804103Z 2025-05-07T19:49:05.5395039Z [INSTALL] Successfully installed PyTorch through PyTorch PIP 2025-05-07T19:49:05.5473589Z ##[group]Run if . $PRELUDE && which conda; then collect_pytorch_env_info $BUILD_ENV; fi 2025-05-07T19:49:05.5474254Z if . $PRELUDE && which conda; then collect_pytorch_env_info $BUILD_ENV; fi 2025-05-07T19:49:05.5474902Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:49:05.5475263Z env: 2025-05-07T19:49:05.5475505Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:49:05.5475813Z BUILD_ENV: build_binary 2025-05-07T19:49:05.5476082Z BUILD_TARGET: genai 2025-05-07T19:49:05.5476329Z BUILD_VARIANT: cuda 2025-05-07T19:49:05.5476567Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:49:05.5476840Z ##[endgroup] 2025-05-07T19:49:05.9874864Z /github/home/miniconda/bin/conda 2025-05-07T19:49:05.9875763Z ################################################################################ 2025-05-07T19:49:05.9876320Z # Collect PyTorch Environment Information (for Reporting Issues) 2025-05-07T19:49:05.9876724Z # 2025-05-07T19:49:05.9893931Z # [2025-05-07T19:49:05.988Z] + collect_pytorch_env_info build_binary 2025-05-07T19:49:05.9894426Z ################################################################################ 2025-05-07T19:49:05.9894673Z 2025-05-07T19:49:05.9909979Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:49:06.0806446Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:49:06.0813776Z [INFO] Downloading the PyTorch environment info collection script ... 2025-05-07T19:49:06.0815706Z + wget -q https://raw.githubusercontent.com/pytorch/pytorch/main/torch/utils/collect_env.py 2025-05-07T19:49:06.0817206Z 2025-05-07T19:49:06.1619889Z 2025-05-07T19:49:06.1621063Z [INFO] Collecting PyTorch environment info (will be needed for reporting issues to PyTorch) ... 2025-05-07T19:49:06.1648168Z [EXEC] [ATTEMPT 0/3] + conda run -n build_binary python collect_env.py 2025-05-07T19:49:11.9338281Z Collecting environment information... 2025-05-07T19:49:11.9338802Z PyTorch version: 2.8.0.dev20250507+cu128 2025-05-07T19:49:11.9339231Z Is debug build: False 2025-05-07T19:49:11.9339520Z CUDA used to build PyTorch: 12.8 2025-05-07T19:49:11.9339869Z ROCM used to build PyTorch: N/A 2025-05-07T19:49:11.9340065Z 2025-05-07T19:49:11.9340218Z OS: Amazon Linux 2023.7.20250428 (x86_64) 2025-05-07T19:49:11.9340586Z GCC version: Could not collect 2025-05-07T19:49:11.9341675Z Clang version: 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4) 2025-05-07T19:49:11.9342308Z CMake version: version 4.0.2 2025-05-07T19:49:11.9342621Z Libc version: glibc-2.34 2025-05-07T19:49:11.9342791Z 2025-05-07T19:49:11.9343133Z Python version: 3.9.22 | packaged by conda-forge | (main, Apr 14 2025, 23:35:59) [GCC 13.3.0] (64-bit runtime) 2025-05-07T19:49:11.9343946Z Python platform: Linux-6.1.130-139.222.amzn2023.x86_64-x86_64-with-glibc2.34 2025-05-07T19:49:11.9344415Z Is CUDA available: False 2025-05-07T19:49:11.9344684Z CUDA runtime version: 12.8.61 2025-05-07T19:49:11.9345004Z CUDA_MODULE_LOADING set to: N/A 2025-05-07T19:49:11.9345342Z GPU models and configuration: Could not collect 2025-05-07T19:49:11.9345727Z Nvidia driver version: Could not collect 2025-05-07T19:49:11.9346050Z cuDNN version: Could not collect 2025-05-07T19:49:11.9346364Z HIP runtime version: N/A 2025-05-07T19:49:11.9346628Z MIOpen runtime version: N/A 2025-05-07T19:49:11.9346940Z Is XNNPACK available: True 2025-05-07T19:49:11.9347112Z 2025-05-07T19:49:11.9347349Z CPU: 2025-05-07T19:49:11.9347572Z Architecture: x86_64 2025-05-07T19:49:11.9347951Z CPU op-mode(s): 32-bit, 64-bit 2025-05-07T19:49:11.9348360Z Address sizes: 46 bits physical, 48 bits virtual 2025-05-07T19:49:11.9348794Z Byte Order: Little Endian 2025-05-07T19:49:11.9349122Z CPU(s): 96 2025-05-07T19:49:11.9349456Z On-line CPU(s) list: 0-95 2025-05-07T19:49:11.9349781Z Vendor ID: GenuineIntel 2025-05-07T19:49:11.9350424Z Model name: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:49:11.9350856Z CPU family: 6 2025-05-07T19:49:11.9351205Z Model: 85 2025-05-07T19:49:11.9351527Z Thread(s) per core: 2 2025-05-07T19:49:11.9351839Z Core(s) per socket: 24 2025-05-07T19:49:11.9352175Z Socket(s): 2 2025-05-07T19:49:11.9352506Z Stepping: 7 2025-05-07T19:49:11.9352822Z BogoMIPS: 6000.01 2025-05-07T19:49:11.9355158Z Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:49:11.9357491Z Hypervisor vendor: KVM 2025-05-07T19:49:11.9357820Z Virtualization type: full 2025-05-07T19:49:11.9358208Z L1d cache: 1.5 MiB (48 instances) 2025-05-07T19:49:11.9358622Z L1i cache: 1.5 MiB (48 instances) 2025-05-07T19:49:11.9359002Z L2 cache: 48 MiB (48 instances) 2025-05-07T19:49:11.9359397Z L3 cache: 71.5 MiB (2 instances) 2025-05-07T19:49:11.9359729Z NUMA node(s): 2 2025-05-07T19:49:11.9360066Z NUMA node0 CPU(s): 0-23,48-71 2025-05-07T19:49:11.9360581Z NUMA node1 CPU(s): 24-47,72-95 2025-05-07T19:49:11.9361098Z Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status 2025-05-07T19:49:11.9361707Z Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported 2025-05-07T19:49:11.9362218Z Vulnerability L1tf: Mitigation; PTE Inversion 2025-05-07T19:49:11.9362873Z Vulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown 2025-05-07T19:49:11.9363566Z Vulnerability Meltdown: Mitigation; PTI 2025-05-07T19:49:11.9364221Z Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown 2025-05-07T19:49:11.9364860Z Vulnerability Reg file data sampling: Not affected 2025-05-07T19:49:11.9365281Z Vulnerability Retbleed: Vulnerable 2025-05-07T19:49:11.9365700Z Vulnerability Spec rstack overflow: Not affected 2025-05-07T19:49:11.9366091Z Vulnerability Spec store bypass: Vulnerable 2025-05-07T19:49:11.9366710Z Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization 2025-05-07T19:49:11.9367770Z Vulnerability Spectre v2: Mitigation; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Retpoline 2025-05-07T19:49:11.9368660Z Vulnerability Srbds: Not affected 2025-05-07T19:49:11.9369056Z Vulnerability Tsx async abort: Not affected 2025-05-07T19:49:11.9369351Z 2025-05-07T19:49:11.9369465Z Versions of relevant libraries: 2025-05-07T19:49:11.9369783Z [pip3] numpy==2.0.2 2025-05-07T19:49:11.9370051Z [pip3] nvidia-cublas-cu12==12.8.3.14 2025-05-07T19:49:11.9370406Z [pip3] nvidia-cuda-cupti-cu12==12.8.57 2025-05-07T19:49:11.9370739Z [pip3] nvidia-cuda-nvrtc-cu12==12.8.61 2025-05-07T19:49:11.9371104Z [pip3] nvidia-cuda-runtime-cu12==12.8.57 2025-05-07T19:49:11.9371442Z [pip3] nvidia-cudnn-cu12==9.8.0.87 2025-05-07T19:49:11.9371779Z [pip3] nvidia-cufft-cu12==11.3.3.41 2025-05-07T19:49:11.9372105Z [pip3] nvidia-curand-cu12==10.3.9.55 2025-05-07T19:49:11.9372433Z [pip3] nvidia-cusolver-cu12==11.7.2.55 2025-05-07T19:49:11.9372908Z [pip3] nvidia-cusparse-cu12==12.5.7.53 2025-05-07T19:49:11.9373253Z [pip3] nvidia-cusparselt-cu12==0.6.3 2025-05-07T19:49:11.9373589Z [pip3] nvidia-nccl-cu12==2.26.2 2025-05-07T19:49:11.9373889Z [pip3] nvidia-nvjitlink-cu12==12.8.61 2025-05-07T19:49:11.9374224Z [pip3] nvidia-nvtx-cu12==12.8.55 2025-05-07T19:49:11.9374533Z [pip3] pytorch-triton==3.3.0+git96316ce5 2025-05-07T19:49:11.9374872Z [pip3] torch==2.8.0.dev20250507+cu128 2025-05-07T19:49:11.9375271Z [conda] cuda-cudart 12.8.57 h5888daf_1 conda-forge 2025-05-07T19:49:11.9375817Z [conda] cuda-cudart-dev 12.8.57 h5888daf_1 conda-forge 2025-05-07T19:49:11.9376377Z [conda] cuda-cudart-dev_linux-64 12.8.57 h3f2d84a_1 conda-forge 2025-05-07T19:49:11.9377057Z [conda] cuda-cudart-static 12.8.57 h5888daf_1 conda-forge 2025-05-07T19:49:11.9377662Z [conda] cuda-cudart-static_linux-64 12.8.57 h3f2d84a_1 conda-forge 2025-05-07T19:49:11.9378246Z [conda] cuda-cudart_linux-64 12.8.57 h3f2d84a_1 conda-forge 2025-05-07T19:49:11.9378790Z [conda] cuda-cupti 12.8.57 hbd13f7d_0 conda-forge 2025-05-07T19:49:11.9379295Z [conda] cuda-cupti-dev 12.8.57 h5888daf_0 conda-forge 2025-05-07T19:49:11.9379850Z [conda] cuda-libraries 12.8.0 ha770c72_0 conda-forge 2025-05-07T19:49:11.9380409Z [conda] cuda-libraries-dev 12.8.0 ha770c72_0 conda-forge 2025-05-07T19:49:11.9380928Z [conda] cuda-nvrtc 12.8.61 hbd13f7d_0 conda-forge 2025-05-07T19:49:11.9381453Z [conda] cuda-nvrtc-dev 12.8.61 h5888daf_0 conda-forge 2025-05-07T19:49:11.9381949Z [conda] cuda-nvtx 12.8.55 hbd13f7d_0 conda-forge 2025-05-07T19:49:11.9382462Z [conda] cuda-opencl 12.8.55 hbd13f7d_0 conda-forge 2025-05-07T19:49:11.9382973Z [conda] cuda-opencl-dev 12.8.55 h5888daf_0 conda-forge 2025-05-07T19:49:11.9383498Z [conda] cuda-runtime 12.8.0 ha804496_0 conda-forge 2025-05-07T19:49:11.9384004Z [conda] libcublas 12.8.3.14 h9ab20c4_0 conda-forge 2025-05-07T19:49:11.9384509Z [conda] libcublas-dev 12.8.3.14 h9ab20c4_0 conda-forge 2025-05-07T19:49:11.9385139Z [conda] libcufft 11.3.3.41 hbd13f7d_0 conda-forge 2025-05-07T19:49:11.9385629Z [conda] libcufft-dev 11.3.3.41 h5888daf_0 conda-forge 2025-05-07T19:49:11.9386145Z [conda] libcurand 10.3.9.55 hbd13f7d_0 conda-forge 2025-05-07T19:49:11.9386641Z [conda] libcurand-dev 10.3.9.55 h5888daf_0 conda-forge 2025-05-07T19:49:11.9387165Z [conda] libcusolver 11.7.2.55 h9ab20c4_0 conda-forge 2025-05-07T19:49:11.9387702Z [conda] libcusolver-dev 11.7.2.55 h9ab20c4_0 conda-forge 2025-05-07T19:49:11.9388217Z [conda] libcusparse 12.5.7.53 hbd13f7d_0 conda-forge 2025-05-07T19:49:11.9388749Z [conda] libcusparse-dev 12.5.7.53 h5888daf_0 conda-forge 2025-05-07T19:49:11.9389266Z [conda] libnvjitlink 12.8.61 hbd13f7d_0 conda-forge 2025-05-07T19:49:11.9389905Z [conda] libnvjitlink-dev 12.8.61 h5888daf_0 conda-forge 2025-05-07T19:49:11.9390379Z [conda] numpy 2.0.2 py39h9cb892a_1 conda-forge 2025-05-07T19:49:11.9390831Z [conda] nvidia-cublas-cu12 12.8.3.14 pypi_0 pypi 2025-05-07T19:49:11.9391345Z [conda] nvidia-cuda-cupti-cu12 12.8.57 pypi_0 pypi 2025-05-07T19:49:11.9391838Z [conda] nvidia-cuda-nvrtc-cu12 12.8.61 pypi_0 pypi 2025-05-07T19:49:11.9392369Z [conda] nvidia-cuda-runtime-cu12 12.8.57 pypi_0 pypi 2025-05-07T19:49:11.9392942Z [conda] nvidia-cudnn-cu12 9.8.0.87 pypi_0 pypi 2025-05-07T19:49:11.9393458Z [conda] nvidia-cufft-cu12 11.3.3.41 pypi_0 pypi 2025-05-07T19:49:11.9393972Z [conda] nvidia-curand-cu12 10.3.9.55 pypi_0 pypi 2025-05-07T19:49:11.9394467Z [conda] nvidia-cusolver-cu12 11.7.2.55 pypi_0 pypi 2025-05-07T19:49:11.9394999Z [conda] nvidia-cusparse-cu12 12.5.7.53 pypi_0 pypi 2025-05-07T19:49:11.9395512Z [conda] nvidia-cusparselt-cu12 0.6.3 pypi_0 pypi 2025-05-07T19:49:11.9396037Z [conda] nvidia-nccl-cu12 2.26.2 pypi_0 pypi 2025-05-07T19:49:11.9396519Z [conda] nvidia-nvjitlink-cu12 12.8.61 pypi_0 pypi 2025-05-07T19:49:11.9397013Z [conda] nvidia-nvtx-cu12 12.8.55 pypi_0 pypi 2025-05-07T19:49:11.9397502Z [conda] pytorch-triton 3.3.0+git96316ce5 pypi_0 pypi 2025-05-07T19:49:11.9397966Z [conda] torch 2.8.0.dev20250507+cu128 pypi_0 pypi 2025-05-07T19:49:11.9398258Z 2025-05-07T19:49:12.0197552Z ##[group]Run . $PRELUDE; install_cudnn $BUILD_ENV "$(pwd)/build_only/cudnn" 12.8.0 2025-05-07T19:49:12.0198187Z . $PRELUDE; install_cudnn $BUILD_ENV "$(pwd)/build_only/cudnn" 12.8.0 2025-05-07T19:49:12.0198798Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:49:12.0199133Z env: 2025-05-07T19:49:12.0199354Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:49:12.0199674Z BUILD_ENV: build_binary 2025-05-07T19:49:12.0199917Z BUILD_TARGET: genai 2025-05-07T19:49:12.0200157Z BUILD_VARIANT: cuda 2025-05-07T19:49:12.0200384Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:49:12.0200641Z ##[endgroup] 2025-05-07T19:49:12.4494573Z ################################################################################ 2025-05-07T19:49:12.4495173Z # Install cuDNN 2025-05-07T19:49:12.4495407Z # 2025-05-07T19:49:12.4504133Z # [2025-05-07T19:49:12.450Z] + install_cudnn build_binary /__w/FBGEMM/FBGEMM/build_only/cudnn 12.8.0 2025-05-07T19:49:12.4504742Z ################################################################################ 2025-05-07T19:49:12.4504986Z 2025-05-07T19:49:12.4522854Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:49:12.5449459Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:49:12.5454624Z [INSTALL] cuda_concat_version is determined to be: 128 2025-05-07T19:49:12.5455583Z [INSTALL] Could not find cuDNN URL for the given cuda_concat_version 128; defaulting to cuDNN for CUDA 11.8 2025-05-07T19:49:12.5456209Z + rm -rf /__w/FBGEMM/FBGEMM/build_only/cudnn 2025-05-07T19:49:12.5456439Z 2025-05-07T19:49:12.5466289Z 2025-05-07T19:49:12.5466455Z + mkdir -p /__w/FBGEMM/FBGEMM/build_only/cudnn 2025-05-07T19:49:12.5466734Z 2025-05-07T19:49:12.5478896Z 2025-05-07T19:49:12.5507636Z [INSTALL] Downloading cuDNN to /tmp/tmp.OR5xPVHXvu ... 2025-05-07T19:49:12.5533575Z [EXEC] [ATTEMPT 0/3] + wget -q https://developer.download.nvidia.com/compute/redist/cudnn/v8.7.0/local_installers/11.8/cudnn-linux-x86_64-8.7.0.84_cuda11-archive.tar.xz -O cudnn.tar.xz 2025-05-07T19:49:17.2997924Z [INSTALL] Unpacking cuDNN ... 2025-05-07T19:49:17.2998408Z + tar -xvf cudnn.tar.xz 2025-05-07T19:49:17.2998595Z 2025-05-07T19:49:17.3023142Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/ 2025-05-07T19:49:17.3023670Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/ 2025-05-07T19:49:17.3024205Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_infer_static.a 2025-05-07T19:49:19.7439267Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_infer_static_v8.a 2025-05-07T19:49:19.7441009Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_train_static.a 2025-05-07T19:49:22.0536309Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_train_static_v8.a 2025-05-07T19:49:22.0536991Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_infer_static.a 2025-05-07T19:49:30.4704218Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_infer_static_v8.a 2025-05-07T19:49:30.4704845Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_train_static.a 2025-05-07T19:49:32.1025033Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_train_static_v8.a 2025-05-07T19:49:32.1025646Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_infer_static.a 2025-05-07T19:49:33.8264143Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_infer_static_v8.a 2025-05-07T19:49:33.8265875Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_train_static.a 2025-05-07T19:49:35.3341415Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_train_static_v8.a 2025-05-07T19:49:35.3342993Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn.so.8 2025-05-07T19:49:35.3344320Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn.so 2025-05-07T19:49:35.3345655Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn.so.8.7.0 2025-05-07T19:49:35.3356375Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_infer.so.8 2025-05-07T19:49:35.3357455Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_infer.so 2025-05-07T19:49:35.3358019Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_infer.so.8.7.0 2025-05-07T19:49:37.7084147Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_train.so.8 2025-05-07T19:49:37.7085797Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_train.so 2025-05-07T19:49:37.7087427Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_train.so.8.7.0 2025-05-07T19:49:39.9598368Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_infer.so 2025-05-07T19:49:39.9599200Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_infer.so.8 2025-05-07T19:49:39.9599726Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_infer.so.8.7.0 2025-05-07T19:49:48.6030039Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_train.so 2025-05-07T19:49:48.6030811Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_train.so.8.7.0 2025-05-07T19:49:50.2597289Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_train.so.8 2025-05-07T19:49:50.2597890Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_infer.so.8.7.0 2025-05-07T19:49:51.9702255Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_infer.so 2025-05-07T19:49:51.9704680Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_infer.so.8 2025-05-07T19:49:51.9706258Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_train.so.8.7.0 2025-05-07T19:49:53.4801447Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_train.so 2025-05-07T19:49:53.4802139Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_train.so.8 2025-05-07T19:49:53.4802600Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/ 2025-05-07T19:49:53.4803052Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_v8.h 2025-05-07T19:49:53.4803551Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_adv_infer_v8.h 2025-05-07T19:49:53.4804083Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_adv_train_v8.h 2025-05-07T19:49:53.4804626Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_backend_v8.h 2025-05-07T19:49:53.4805166Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_cnn_infer_v8.h 2025-05-07T19:49:53.4805694Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_cnn_train_v8.h 2025-05-07T19:49:53.4806221Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_ops_infer_v8.h 2025-05-07T19:49:53.4806757Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_ops_train_v8.h 2025-05-07T19:49:53.4807280Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_version_v8.h 2025-05-07T19:49:53.4807769Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn.h 2025-05-07T19:49:53.4808262Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_adv_infer.h 2025-05-07T19:49:53.4808769Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_adv_train.h 2025-05-07T19:49:53.4809297Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_backend.h 2025-05-07T19:49:53.4809811Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_cnn_infer.h 2025-05-07T19:49:53.4810337Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_cnn_train.h 2025-05-07T19:49:53.4810851Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_ops_infer.h 2025-05-07T19:49:53.4812080Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_ops_train.h 2025-05-07T19:49:53.4812847Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_version.h 2025-05-07T19:49:53.4813366Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/LICENSE 2025-05-07T19:49:53.4821826Z 2025-05-07T19:49:53.4822675Z [INSTALL] Moving cuDNN files to /__w/FBGEMM/FBGEMM/build_only/cudnn ... 2025-05-07T19:49:53.4823198Z + rm -rf /__w/FBGEMM/FBGEMM/build_only/cudnn/include 2025-05-07T19:49:53.4823451Z 2025-05-07T19:49:53.4841654Z 2025-05-07T19:49:53.4842140Z + rm -rf /__w/FBGEMM/FBGEMM/build_only/cudnn/lib 2025-05-07T19:49:53.4842412Z 2025-05-07T19:49:53.4856015Z 2025-05-07T19:49:53.4856518Z + mv cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include /__w/FBGEMM/FBGEMM/build_only/cudnn 2025-05-07T19:49:53.4857341Z 2025-05-07T19:49:53.4887897Z 2025-05-07T19:49:53.4888323Z + mv cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib /__w/FBGEMM/FBGEMM/build_only/cudnn 2025-05-07T19:49:53.4888716Z 2025-05-07T19:49:54.7871095Z 2025-05-07T19:49:54.7871397Z /__w/FBGEMM/FBGEMM 2025-05-07T19:49:54.7871705Z + rm -rf /tmp/tmp.OR5xPVHXvu 2025-05-07T19:49:54.8355832Z 2025-05-07T19:49:54.8355848Z 2025-05-07T19:49:54.8370663Z [INSTALL] Set environment variables CUDNN_INCLUDE_DIR and CUDNN_LIBRARY ... 2025-05-07T19:49:54.8371636Z + conda env config vars set -n build_binary CUDNN_INCLUDE_DIR=/__w/FBGEMM/FBGEMM/build_only/cudnn/include CUDNN_LIBRARY=/__w/FBGEMM/FBGEMM/build_only/cudnn/lib 2025-05-07T19:49:54.8372304Z 2025-05-07T19:49:55.2470839Z 2025-05-07T19:49:55.2471188Z [INSTALL] Successfully installed cuDNN (for CUDA 12.8.0) 2025-05-07T19:49:55.2572802Z ##[group]Run . $PRELUDE; cd fbgemm_gpu; prepare_fbgemm_gpu_build $BUILD_ENV 2025-05-07T19:49:55.2573456Z . $PRELUDE; cd fbgemm_gpu; prepare_fbgemm_gpu_build $BUILD_ENV 2025-05-07T19:49:55.2574130Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:49:55.2574600Z env: 2025-05-07T19:49:55.2574818Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:49:55.2575336Z BUILD_ENV: build_binary 2025-05-07T19:49:55.2575600Z BUILD_TARGET: genai 2025-05-07T19:49:55.2575871Z BUILD_VARIANT: cuda 2025-05-07T19:49:55.2576125Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:49:55.2576418Z ##[endgroup] 2025-05-07T19:49:55.6714164Z ################################################################################ 2025-05-07T19:49:55.6714613Z # Prepare FBGEMM-GPU Build 2025-05-07T19:49:55.6714930Z # 2025-05-07T19:49:55.6726850Z # [2025-05-07T19:49:55.672Z] + prepare_fbgemm_gpu_build build_binary 2025-05-07T19:49:55.6727445Z ################################################################################ 2025-05-07T19:49:55.6727698Z 2025-05-07T19:49:55.6741615Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:49:55.7577779Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:49:55.7590170Z [BUILD] Running git submodules update ... 2025-05-07T19:49:55.7611521Z [EXEC] [ATTEMPT 0/3] + git submodule sync 2025-05-07T19:49:55.7899478Z Synchronizing submodule url for '../external/asmjit' 2025-05-07T19:49:55.7900950Z Synchronizing submodule url for '../external/composable_kernel' 2025-05-07T19:49:55.7902304Z Synchronizing submodule url for '../external/cpuinfo' 2025-05-07T19:49:55.7903530Z Synchronizing submodule url for '../external/cutlass' 2025-05-07T19:49:55.7904756Z Synchronizing submodule url for '../external/googletest' 2025-05-07T19:49:55.7906070Z Synchronizing submodule url for '../external/hipify_torch' 2025-05-07T19:49:55.7907321Z Synchronizing submodule url for '../external/json' 2025-05-07T19:49:55.7925681Z [EXEC] [ATTEMPT 0/3] + git submodule update --init --recursive 2025-05-07T19:49:55.8340873Z [BUILD] Installing other build dependencies ... 2025-05-07T19:49:55.8372471Z [EXEC] [ATTEMPT 0/3] + conda run --no-capture-output -n build_binary python -m pip install -r requirements.txt 2025-05-07T19:49:57.9476743Z Collecting backports.tarfile (from -r requirements.txt (line 13)) 2025-05-07T19:49:57.9596586Z Downloading backports.tarfile-1.2.0-py3-none-any.whl.metadata (2.0 kB) 2025-05-07T19:49:57.9687287Z Requirement already satisfied: build in /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages (from -r requirements.txt (line 14)) (1.2.2.post1) 2025-05-07T19:49:58.1015445Z Collecting cmake (from -r requirements.txt (line 15)) 2025-05-07T19:49:58.1058731Z Downloading cmake-4.0.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.3 kB) 2025-05-07T19:49:58.1134580Z Requirement already satisfied: click in /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages (from -r requirements.txt (line 16)) (8.1.8) 2025-05-07T19:49:58.1135989Z Requirement already satisfied: hypothesis in /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages (from -r requirements.txt (line 17)) (6.131.14) 2025-05-07T19:49:58.1140036Z Requirement already satisfied: jinja2 in /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages (from -r requirements.txt (line 18)) (3.1.6) 2025-05-07T19:49:58.1144071Z Requirement already satisfied: mpmath==1.3.0 in /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages (from -r requirements.txt (line 19)) (1.3.0) 2025-05-07T19:49:58.1463476Z Collecting ninja (from -r requirements.txt (line 20)) 2025-05-07T19:49:58.1506894Z Downloading ninja-1.11.1.4-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.metadata (5.0 kB) 2025-05-07T19:49:58.1579202Z Requirement already satisfied: numpy>=2.0.2 in /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages (from -r requirements.txt (line 21)) (2.0.2) 2025-05-07T19:49:58.1771975Z Collecting pyre-extensions (from -r requirements.txt (line 22)) 2025-05-07T19:49:58.1811127Z Downloading pyre_extensions-0.0.32-py3-none-any.whl.metadata (4.0 kB) 2025-05-07T19:49:58.1877675Z Requirement already satisfied: pyyaml in /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages (from -r requirements.txt (line 23)) (6.0.2) 2025-05-07T19:49:58.1879053Z Requirement already satisfied: scikit-build in /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages (from -r requirements.txt (line 24)) (0.18.1) 2025-05-07T19:49:58.1881694Z Requirement already satisfied: setuptools in /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages (from -r requirements.txt (line 25)) (78.1.1) 2025-05-07T19:49:58.2120064Z Collecting setuptools_git_versioning (from -r requirements.txt (line 26)) 2025-05-07T19:49:58.2158371Z Downloading setuptools_git_versioning-2.1.0-py3-none-any.whl.metadata (6.1 kB) 2025-05-07T19:49:58.2355848Z Collecting tabulate (from -r requirements.txt (line 27)) 2025-05-07T19:49:58.2387775Z Downloading tabulate-0.9.0-py3-none-any.whl.metadata (34 kB) 2025-05-07T19:49:58.2647541Z Collecting patchelf (from -r requirements.txt (line 28)) 2025-05-07T19:49:58.2691066Z Downloading patchelf-0.17.2.2-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.musllinux_1_1_x86_64.whl.metadata (3.5 kB) 2025-05-07T19:49:58.2847972Z Requirement already satisfied: packaging>=19.1 in /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages (from build->-r requirements.txt (line 14)) (25.0) 2025-05-07T19:49:58.2852265Z Requirement already satisfied: pyproject_hooks in /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages (from build->-r requirements.txt (line 14)) (1.2.0) 2025-05-07T19:49:58.2856431Z Requirement already satisfied: importlib-metadata>=4.6 in /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages (from build->-r requirements.txt (line 14)) (8.7.0) 2025-05-07T19:49:58.2861197Z Requirement already satisfied: tomli>=1.1.0 in /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages (from build->-r requirements.txt (line 14)) (2.2.1) 2025-05-07T19:49:58.2983121Z Requirement already satisfied: attrs>=22.2.0 in /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages (from hypothesis->-r requirements.txt (line 17)) (25.3.0) 2025-05-07T19:49:58.2987754Z Requirement already satisfied: exceptiongroup>=1.0.0 in /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages (from hypothesis->-r requirements.txt (line 17)) (1.2.2) 2025-05-07T19:49:58.2993460Z Requirement already satisfied: sortedcontainers<3.0.0,>=2.1.0 in /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages (from hypothesis->-r requirements.txt (line 17)) (2.4.0) 2025-05-07T19:49:58.3016383Z Requirement already satisfied: MarkupSafe>=2.0 in /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages (from jinja2->-r requirements.txt (line 18)) (3.0.2) 2025-05-07T19:49:58.3146578Z Collecting typing-inspect (from pyre-extensions->-r requirements.txt (line 22)) 2025-05-07T19:49:58.3184301Z Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB) 2025-05-07T19:49:58.3261603Z Requirement already satisfied: typing-extensions in /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages (from pyre-extensions->-r requirements.txt (line 22)) (4.13.2) 2025-05-07T19:49:58.3308367Z Requirement already satisfied: distro in /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages (from scikit-build->-r requirements.txt (line 24)) (1.9.0) 2025-05-07T19:49:58.3316458Z Requirement already satisfied: wheel>=0.32.0 in /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages (from scikit-build->-r requirements.txt (line 24)) (0.45.1) 2025-05-07T19:49:58.3533383Z Requirement already satisfied: zipp>=3.20 in /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages (from importlib-metadata>=4.6->build->-r requirements.txt (line 14)) (3.21.0) 2025-05-07T19:49:58.3727286Z Collecting mypy-extensions>=0.3.0 (from typing-inspect->pyre-extensions->-r requirements.txt (line 22)) 2025-05-07T19:49:58.3766896Z Downloading mypy_extensions-1.1.0-py3-none-any.whl.metadata (1.1 kB) 2025-05-07T19:49:58.3874962Z Downloading backports.tarfile-1.2.0-py3-none-any.whl (30 kB) 2025-05-07T19:49:58.3978796Z Downloading cmake-4.0.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27.9 MB) 2025-05-07T19:49:58.5168290Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 27.9/27.9 MB 241.2 MB/s eta 0:00:00 2025-05-07T19:49:58.5211223Z Downloading ninja-1.11.1.4-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (422 kB) 2025-05-07T19:49:58.5301842Z Downloading pyre_extensions-0.0.32-py3-none-any.whl (12 kB) 2025-05-07T19:49:58.5372609Z Downloading setuptools_git_versioning-2.1.0-py3-none-any.whl (10 kB) 2025-05-07T19:49:58.5449691Z Downloading tabulate-0.9.0-py3-none-any.whl (35 kB) 2025-05-07T19:49:58.5520450Z Downloading patchelf-0.17.2.2-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.musllinux_1_1_x86_64.whl (466 kB) 2025-05-07T19:49:58.5619568Z Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB) 2025-05-07T19:49:58.5679561Z Downloading mypy_extensions-1.1.0-py3-none-any.whl (5.0 kB) 2025-05-07T19:49:58.7496399Z Installing collected packages: tabulate, setuptools_git_versioning, patchelf, ninja, mypy-extensions, cmake, backports.tarfile, typing-inspect, pyre-extensions 2025-05-07T19:49:59.7033608Z 2025-05-07T19:49:59.7099583Z Successfully installed backports.tarfile-1.2.0 cmake-4.0.0 mypy-extensions-1.1.0 ninja-1.11.1.4 patchelf-0.17.2.2 pyre-extensions-0.0.32 setuptools_git_versioning-2.1.0 tabulate-0.9.0 typing-inspect-0.9.0 2025-05-07T19:49:59.7101889Z WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-05-07T19:49:59.8531374Z ################################################################################ 2025-05-07T19:49:59.8532525Z # Install PyTorch (PyTorch PIP) 2025-05-07T19:49:59.8533297Z # 2025-05-07T19:49:59.8548264Z # [2025-05-07T19:49:59.854Z] + install_triton_pip build_binary 2025-05-07T19:49:59.8549631Z ################################################################################ 2025-05-07T19:49:59.8550082Z 2025-05-07T19:49:59.8550389Z [BUILD] Installing pytorch-triton nightly/3.2.0+git4b3bb1f8 from PIP ... 2025-05-07T19:49:59.8550873Z ################################################################################ 2025-05-07T19:49:59.8551316Z # Install Package From PyTorch PIP: pytorch-triton 2025-05-07T19:49:59.8551678Z # 2025-05-07T19:49:59.8570823Z # [2025-05-07T19:49:59.856Z] + install_from_pytorch_pip build_binary pytorch-triton nightly/3.2.0+git4b3bb1f8 2025-05-07T19:49:59.8571476Z ################################################################################ 2025-05-07T19:49:59.8571756Z 2025-05-07T19:49:59.8587219Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:49:59.9402444Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:49:59.9418197Z ################################################################################ 2025-05-07T19:49:59.9419348Z # Prepare PIP Arguments (PyTorch PIP) 2025-05-07T19:49:59.9420264Z # 2025-05-07T19:49:59.9420724Z # [2025-05-07T19:49:59.941Z] + __prepare_pip_arguments pytorch-triton nightly/3.2.0+git4b3bb1f8 2025-05-07T19:49:59.9421299Z ################################################################################ 2025-05-07T19:49:59.9421547Z 2025-05-07T19:49:59.9461719Z [INSTALL] Extracted package (channel, version): (nightly, 3.2.0+git4b3bb1f8) 2025-05-07T19:49:59.9475974Z [INSTALL] Using a non-RELEASE channel: nightly ... 2025-05-07T19:49:59.9477673Z [INSTALL] Extracted the full PIP channel: https://download.pytorch.org/whl/nightly/ 2025-05-07T19:49:59.9482267Z [INSTALL] Extracted the full PIP package: --pre pytorch-triton==3.2.0+git4b3bb1f8 2025-05-07T19:49:59.9487188Z [INSTALL] Attempting to install [pytorch-triton, 3.2.0+git4b3bb1f8] from PyTorch PIP using channel https://download.pytorch.org/whl/nightly/ ... 2025-05-07T19:49:59.9508921Z [EXEC] [ATTEMPT 0/3] + conda run -n build_binary pip install --pre pytorch-triton==3.2.0+git4b3bb1f8 --index-url https://download.pytorch.org/whl/nightly/ 2025-05-07T19:50:05.5023956Z ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. 2025-05-07T19:50:05.5025385Z torch 2.8.0.dev20250507+cu128 requires pytorch-triton==3.3.0+git96316ce5; platform_system == "Linux", but you have pytorch-triton 3.2.0+git4b3bb1f8 which is incompatible. 2025-05-07T19:50:05.5027545Z WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-05-07T19:50:05.5029052Z 2025-05-07T19:50:05.5029262Z Looking in indexes: https://download.pytorch.org/whl/nightly/ 2025-05-07T19:50:05.5029735Z Collecting pytorch-triton==3.2.0+git4b3bb1f8 2025-05-07T19:50:05.5030625Z Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-3.2.0%2Bgit4b3bb1f8-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (1.3 kB) 2025-05-07T19:50:05.5031968Z Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-3.2.0%2Bgit4b3bb1f8-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (166.4 MB) 2025-05-07T19:50:05.5033213Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 166.4/166.4 MB 169.0 MB/s eta 0:00:00 2025-05-07T19:50:05.5033623Z Installing collected packages: pytorch-triton 2025-05-07T19:50:05.5034003Z Attempting uninstall: pytorch-triton 2025-05-07T19:50:05.5034429Z Found existing installation: pytorch-triton 3.3.0+git96316ce5 2025-05-07T19:50:05.5034888Z Uninstalling pytorch-triton-3.3.0+git96316ce5: 2025-05-07T19:50:05.5035451Z Successfully uninstalled pytorch-triton-3.3.0+git96316ce5 2025-05-07T19:50:05.5035881Z Successfully installed pytorch-triton-3.2.0+git4b3bb1f8 2025-05-07T19:50:05.5036141Z 2025-05-07T19:50:07.6427745Z [CHECK] Python (sub-)package 'triton' found ... 2025-05-07T19:50:07.6428973Z [CHECK] Printing out the pytorch-triton version ... 2025-05-07T19:50:09.7179887Z ################################################################################ 2025-05-07T19:50:09.7180447Z [CHECK] The installed VERSION of pytorch-triton is: 3.2.0 2025-05-07T19:50:09.7180842Z ################################################################################ 2025-05-07T19:50:09.7181098Z 2025-05-07T19:50:11.6920904Z [CHECK] Python (sub-)package 'numpy' found ... 2025-05-07T19:50:13.7444692Z [CHECK] Python (sub-)package 'skbuild' found ... 2025-05-07T19:50:13.7446347Z [BUILD] Successfully ran git submodules update 2025-05-07T19:50:13.7526557Z ##[group]Run . $PRELUDE; cd fbgemm_gpu; build_fbgemm_gpu_package $BUILD_ENV nightly genai/cuda 2025-05-07T19:50:13.7527281Z . $PRELUDE; cd fbgemm_gpu; build_fbgemm_gpu_package $BUILD_ENV nightly genai/cuda 2025-05-07T19:50:13.7527867Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:50:13.7528215Z env: 2025-05-07T19:50:13.7528448Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:50:13.7528747Z BUILD_ENV: build_binary 2025-05-07T19:50:13.7529000Z BUILD_TARGET: genai 2025-05-07T19:50:13.7529223Z BUILD_VARIANT: cuda 2025-05-07T19:50:13.7529465Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:50:13.7529709Z ##[endgroup] 2025-05-07T19:50:14.1978220Z [BUILD] BUILD_TARGET_VARIANT: genai/cuda 2025-05-07T19:50:14.1979226Z [BUILD] Extracted build target: genai 2025-05-07T19:50:14.1979580Z [BUILD] Extracted build variant: cuda 2025-05-07T19:50:16.0183685Z /github/home/miniconda/envs/build_binary/bin/cc 2025-05-07T19:50:16.0184307Z 2025-05-07T19:50:16.0946359Z [CHECK] Binary cc found in PATH 2025-05-07T19:50:17.9200630Z /github/home/miniconda/envs/build_binary/bin/gcc 2025-05-07T19:50:17.9201011Z 2025-05-07T19:50:17.9779099Z [CHECK] Binary gcc found in PATH 2025-05-07T19:50:19.8021573Z /github/home/miniconda/envs/build_binary/bin/c++ 2025-05-07T19:50:19.8021955Z 2025-05-07T19:50:19.8639016Z [CHECK] Binary c++ found in PATH 2025-05-07T19:50:21.6812444Z /github/home/miniconda/envs/build_binary/bin/g++ 2025-05-07T19:50:21.6812788Z 2025-05-07T19:50:21.7566221Z [CHECK] Binary g++ found in PATH 2025-05-07T19:50:23.6552176Z [BUILD] Extracted and set Python tag: py39 2025-05-07T19:50:23.6552788Z [BUILD] Extracted and set Python platform name: manylinux_2_28_x86_64 2025-05-07T19:50:23.6801996Z core = 24 2025-05-07T19:50:23.7014431Z sockets = 2 2025-05-07T19:50:23.7014892Z [BUILD] Set multicore run option for setup.py: -j 48 2025-05-07T19:50:23.7015352Z [CHECK] LD_LIBRARY_PATH = 2025-05-07T19:50:23.7015653Z [BUILD] Running pre-build cleanups ... 2025-05-07T19:50:23.7016057Z + rm -rf dist 2025-05-07T19:50:23.7016198Z 2025-05-07T19:50:23.7027378Z 2025-05-07T19:50:23.7027777Z + conda run --no-capture-output -n build_binary python setup.py clean 2025-05-07T19:50:23.7028118Z 2025-05-07T19:50:26.9523750Z INFO:root:running clean 2025-05-07T19:50:26.9524762Z [SETUP.PY] ARGV: ['setup.py', 'clean'] 2025-05-07T19:50:26.9527997Z [SETUP.PY] Parsed setup.py arguments: Namespace(verbose=False, debug=False, dryrun=False, build_target='default', build_variant='cuda', package_channel='nightly', nvml_lib_path=None, nccl_lib_path=None, use_fb_only=False, cxxprefix=None) 2025-05-07T19:50:26.9529239Z [SETUP.PY] Other arguments: ['clean'] 2025-05-07T19:50:26.9529711Z [SETUP.PY] CUDA CUB directory environment variable not set. Using default CUB location. 2025-05-07T19:50:26.9530308Z [SETUP.PY] Using CUDA = /github/home/miniconda/envs/build_binary 2025-05-07T19:50:26.9530913Z [SETUP.PY] Generating version file at: /__w/FBGEMM/FBGEMM/fbgemm_gpu/fbgemm_gpu/docs/version.py 2025-05-07T19:50:26.9531494Z [SETUP.PY] Setting the FBGEMM build target: default ... 2025-05-07T19:50:26.9531910Z [SETUP.PY] Setting the FBGEMM build variant: cuda ... 2025-05-07T19:50:26.9533101Z [SETUP.PY] Passing CMake arguments: ['-DCMAKE_PREFIX_PATH=/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch', '-D_GLIBCXX_USE_CXX11_ABI=1', '-DFBGEMM_BUILD_TARGET=default', '-DFBGEMM_BUILD_VARIANT=cuda', "-DCMAKE_C_FLAGS=''", "-DCMAKE_CXX_FLAGS=''"] 2025-05-07T19:50:27.3544376Z 2025-05-07T19:50:27.3544777Z [BUILD] Printing git status ... 2025-05-07T19:50:27.3545125Z + git status 2025-05-07T19:50:27.3545275Z 2025-05-07T19:50:27.9060347Z HEAD detached at pull/4066/merge 2025-05-07T19:50:27.9060688Z Untracked files: 2025-05-07T19:50:27.9061010Z (use "git add ..." to include in what will be committed) 2025-05-07T19:50:27.9061392Z ../build_only/ 2025-05-07T19:50:27.9061607Z ../collect_env.py 2025-05-07T19:50:27.9061861Z fbgemm_gpu/docs/version.py 2025-05-07T19:50:27.9062035Z 2025-05-07T19:50:27.9062535Z nothing added to commit but untracked files present (use "git add" to track) 2025-05-07T19:50:27.9062932Z 2025-05-07T19:50:27.9063018Z + git diff 2025-05-07T19:50:27.9063221Z 2025-05-07T19:50:27.9337716Z 2025-05-07T19:50:27.9338070Z ################################################################################ 2025-05-07T19:50:27.9338521Z # Configure FBGEMM-GPU Build 2025-05-07T19:50:27.9338803Z # 2025-05-07T19:50:27.9355875Z # [2025-05-07T19:50:27.934Z] + __configure_fbgemm_gpu_build 2025-05-07T19:50:27.9356379Z ################################################################################ 2025-05-07T19:50:27.9356624Z 2025-05-07T19:50:27.9359837Z [BUILD] Setting the build target: genai ... 2025-05-07T19:50:27.9360296Z [BUILD] Configuring build as CUDA variant (this is the default behavior) ... 2025-05-07T19:50:29.7327363Z /github/home/miniconda/envs/build_binary/bin/nvcc 2025-05-07T19:50:29.7327730Z 2025-05-07T19:50:29.7908131Z [CHECK] Binary nvcc found in PATH 2025-05-07T19:50:31.5839690Z /__w/FBGEMM/FBGEMM/build_only/cudnn/include 2025-05-07T19:50:31.5840019Z 2025-05-07T19:50:31.6417295Z [CHECK] Environment variable CUDNN_INCLUDE_DIR is defined in the Conda environment 2025-05-07T19:50:33.4397649Z /__w/FBGEMM/FBGEMM/build_only/cudnn/lib 2025-05-07T19:50:33.4397971Z 2025-05-07T19:50:33.4971579Z [CHECK] Environment variable CUDNN_LIBRARY is defined in the Conda environment 2025-05-07T19:50:35.2988142Z /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:50:35.2988552Z 2025-05-07T19:50:35.3579199Z [CHECK] Environment variable NVML_LIB_PATH is defined in the Conda environment 2025-05-07T19:50:37.2333207Z [BUILD] Using the default architectures for CUDA nvcc: NVIDIA (R) Cuda compiler driver 2025-05-07T19:50:37.2333797Z Copyright (c) 2005-2025 NVIDIA Corporation 2025-05-07T19:50:37.2334144Z Built on Wed_Jan_15_19:20:09_PST_2025 2025-05-07T19:50:37.2334466Z Cuda compilation tools, release 12.8, V12.8.61 2025-05-07T19:50:37.2334875Z Build cuda_12.8.r12.8/compiler.35404655_0 ... 2025-05-07T19:50:37.2335342Z [BUILD] Setting the following CUDA targets: 7.0;8.0;9.0;9.0a;10.0a;12.0a 2025-05-07T19:50:37.2335771Z [BUILD] Looking up NVML filepath ... 2025-05-07T19:50:39.1580315Z [BUILD] Looking up NCCL filepath ... 2025-05-07T19:50:43.0290135Z [BUILD] Setting NVCC verbose mode ... 2025-05-07T19:50:43.0290685Z + conda env config vars set -n build_binary NVCC_VERBOSE=1 2025-05-07T19:50:43.0290976Z 2025-05-07T19:50:43.4563934Z 2025-05-07T19:50:43.4564293Z [BUILD] Setting CUDA build args ... 2025-05-07T19:50:45.3408985Z [BUILD] Looking up CUDA version ... 2025-05-07T19:50:49.0522198Z + conda run -n build_binary c++ --version | grep -i clang 2025-05-07T19:50:49.0523142Z 2025-05-07T19:50:50.9306469Z clang version 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4) 2025-05-07T19:50:50.9309141Z Configuration file: /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-clang++.cfg 2025-05-07T19:50:50.9310543Z 2025-05-07T19:50:50.9310974Z [BUILD] Setting NVCC flags ... 2025-05-07T19:50:50.9312065Z + conda env config vars set -n build_binary NVCC_PREPEND_FLAGS="-std=c++20 -Xcompiler -std=c++20 -Xcompiler -stdlib=libstdc++ -ccbin /github/home/miniconda/envs/build_binary/bin/c++ -allow-unsupported-compiler" 2025-05-07T19:50:50.9312921Z 2025-05-07T19:50:51.3369344Z 2025-05-07T19:50:51.3369797Z + conda run -n build_binary printenv NVCC_PREPEND_FLAGS 2025-05-07T19:50:51.3370100Z 2025-05-07T19:50:53.1770204Z -std=c++20 -Xcompiler -std=c++20 -Xcompiler -stdlib=libstdc++ -ccbin /github/home/miniconda/envs/build_binary/bin/c++ -allow-unsupported-compiler 2025-05-07T19:50:53.1772179Z 2025-05-07T19:50:53.2527343Z 2025-05-07T19:50:53.2527917Z [BUILD] Setting CUDA build args ... 2025-05-07T19:50:53.2528948Z + conda run -n build_binary c++ --version 2025-05-07T19:50:53.2529608Z 2025-05-07T19:50:55.0967837Z clang version 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4) 2025-05-07T19:50:55.0969071Z Target: x86_64-conda-linux-gnu 2025-05-07T19:50:55.0969376Z Thread model: posix 2025-05-07T19:50:55.0969743Z InstalledDir: /github/home/miniconda/envs/build_binary/bin 2025-05-07T19:50:55.0970403Z Configuration file: /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-clang++.cfg 2025-05-07T19:50:55.0970930Z 2025-05-07T19:50:55.1561786Z 2025-05-07T19:50:55.1562566Z + conda run -n build_binary c++ --version | grep -i clang 2025-05-07T19:50:55.1563403Z 2025-05-07T19:50:57.0612754Z clang version 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4) 2025-05-07T19:50:57.0613806Z Configuration file: /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-clang++.cfg 2025-05-07T19:50:57.0614311Z 2025-05-07T19:50:57.0614561Z [BUILD] Clang is available; configuring for Clang-based build ... 2025-05-07T19:50:58.9311365Z .github/scripts/fbgemm_gpu_build.bash: line 370: [: : integer expression expected 2025-05-07T19:50:58.9312978Z [BUILD] Enabling debug features in the build ... 2025-05-07T19:50:58.9320085Z [BUILD] FBGEMM_GPU build arguments have been set: --verbose --build-target=genai --build-variant=cuda --nvml_lib_path=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so --nccl_lib_path=/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib/libnccl.so.2 -DTORCH_CUDA_ARCH_LIST='7.0;8.0;9.0;9.0a;10.0a;12.0a' -DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCMAKE_CXX_STANDARD=20 --cxxprefix=/github/home/miniconda/envs/build_binary --debug 2025-05-07T19:50:58.9325847Z ################################################################################ 2025-05-07T19:50:58.9326194Z # Build FBGEMM-GPU Package (Wheel) 2025-05-07T19:50:58.9326496Z # 2025-05-07T19:50:58.9330439Z # [2025-05-07T19:50:58.932Z] + build_fbgemm_gpu_package build_binary nightly genai/cuda 2025-05-07T19:50:58.9330966Z ################################################################################ 2025-05-07T19:50:58.9331216Z 2025-05-07T19:50:58.9331449Z [BUILD] Building FBGEMM wheel (TARGET=genai, VARIANT=cuda) ... 2025-05-07T19:50:58.9337178Z + conda run --no-capture-output -n build_binary python -m build --wheel --no-isolation --config-setting=--build-option=--verbose --config-setting=--build-option=--build-target=genai --config-setting=--build-option=--build-variant=cuda --config-setting=--build-option=--nvml_lib_path=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so --config-setting=--build-option=--nccl_lib_path=/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib/libnccl.so.2 --config-setting=--build-option=-DTORCH_CUDA_ARCH_LIST='7.0;8.0;9.0;9.0a;10.0a;12.0a' --config-setting=--build-option=-DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux --config-setting=--build-option=-DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux --config-setting=--build-option=-DCMAKE_CXX_STANDARD=20 --config-setting=--build-option=--cxxprefix=/github/home/miniconda/envs/build_binary --config-setting=--build-option=--debug --config-setting=--build-option=--package_channel=nightly --config-setting=--build-option=--python-tag=py39 --config-setting=--build-option=--plat-name=manylinux_2_28_x86_64 2025-05-07T19:50:58.9342304Z 2025-05-07T19:51:00.7848084Z * Getting build dependencies for wheel... 2025-05-07T19:51:02.2120926Z INFO:root:running egg_info 2025-05-07T19:51:02.2149342Z INFO:root:creating fbgemm_gpu_nightly.egg-info 2025-05-07T19:51:02.2150022Z INFO:root:writing fbgemm_gpu_nightly.egg-info/PKG-INFO 2025-05-07T19:51:02.2154182Z INFO:root:writing dependency_links to fbgemm_gpu_nightly.egg-info/dependency_links.txt 2025-05-07T19:51:02.2156320Z INFO:root:writing requirements to fbgemm_gpu_nightly.egg-info/requires.txt 2025-05-07T19:51:02.2157302Z INFO:root:writing top-level names to fbgemm_gpu_nightly.egg-info/top_level.txt 2025-05-07T19:51:02.2158891Z INFO:root:writing manifest file 'fbgemm_gpu_nightly.egg-info/SOURCES.txt' 2025-05-07T19:51:02.2222827Z INFO:root:reading manifest file 'fbgemm_gpu_nightly.egg-info/SOURCES.txt' 2025-05-07T19:51:02.2240560Z INFO:root:writing manifest file 'fbgemm_gpu_nightly.egg-info/SOURCES.txt' 2025-05-07T19:51:02.2243506Z [SETUP.PY] ARGV: ['setup.py', 'egg_info'] 2025-05-07T19:51:02.2244628Z [SETUP.PY] Parsed setup.py arguments: Namespace(verbose=False, debug=False, dryrun=False, build_target='default', build_variant='cuda', package_channel='nightly', nvml_lib_path=None, nccl_lib_path=None, use_fb_only=False, cxxprefix=None) 2025-05-07T19:51:02.2245746Z [SETUP.PY] Other arguments: ['egg_info'] 2025-05-07T19:51:02.2246243Z [SETUP.PY] CUDA CUB directory environment variable not set. Using default CUB location. 2025-05-07T19:51:02.2246845Z [SETUP.PY] Using CUDA = /github/home/miniconda/envs/build_binary 2025-05-07T19:51:02.2247467Z [SETUP.PY] Generating version file at: /__w/FBGEMM/FBGEMM/fbgemm_gpu/fbgemm_gpu/docs/version.py 2025-05-07T19:51:02.2248080Z [SETUP.PY] Setting the FBGEMM build target: default ... 2025-05-07T19:51:02.2248507Z [SETUP.PY] Setting the FBGEMM build variant: cuda ... 2025-05-07T19:51:02.2249840Z [SETUP.PY] Passing CMake arguments: ['-DCMAKE_PREFIX_PATH=/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch', '-D_GLIBCXX_USE_CXX11_ABI=1', '-DFBGEMM_BUILD_TARGET=default', '-DFBGEMM_BUILD_VARIANT=cuda', "-DCMAKE_C_FLAGS=''", "-DCMAKE_CXX_FLAGS=''"] 2025-05-07T19:51:02.5440690Z * Building wheel... 2025-05-07T19:51:03.9692691Z [SETUP.PY] ARGV: ['setup.py', 'bdist_wheel', '--dist-dir', '/__w/FBGEMM/FBGEMM/fbgemm_gpu/dist/.tmp-sk97hdnz', '--verbose', '--build-target=genai', '--build-variant=cuda', '--nvml_lib_path=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so', '--nccl_lib_path=/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib/libnccl.so.2', '-DTORCH_CUDA_ARCH_LIST=7.0;8.0;9.0;9.0a;10.0a;12.0a', '-DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCMAKE_CXX_STANDARD=20', '--cxxprefix=/github/home/miniconda/envs/build_binary', '--debug', '--package_channel=nightly', '--python-tag=py39', '--plat-name=manylinux_2_28_x86_64'] 2025-05-07T19:51:03.9697554Z [SETUP.PY] Parsed setup.py arguments: Namespace(verbose=True, debug=True, dryrun=False, build_target='genai', build_variant='cuda', package_channel='nightly', nvml_lib_path='/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so', nccl_lib_path='/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib/libnccl.so.2', use_fb_only=False, cxxprefix='/github/home/miniconda/envs/build_binary') 2025-05-07T19:51:03.9700892Z [SETUP.PY] Other arguments: ['bdist_wheel', '--dist-dir', '/__w/FBGEMM/FBGEMM/fbgemm_gpu/dist/.tmp-sk97hdnz', '-DTORCH_CUDA_ARCH_LIST=7.0;8.0;9.0;9.0a;10.0a;12.0a', '-DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCMAKE_CXX_STANDARD=20', '--python-tag=py39', '--plat-name=manylinux_2_28_x86_64'] 2025-05-07T19:51:03.9702779Z [SETUP.PY] CUDA CUB directory environment variable not set. Using default CUB location. 2025-05-07T19:51:03.9703378Z [SETUP.PY] Using CUDA = /github/home/miniconda/envs/build_binary 2025-05-07T19:51:03.9704060Z [SETUP.PY] Generating version file at: /__w/FBGEMM/FBGEMM/fbgemm_gpu/fbgemm_gpu/docs/version.py 2025-05-07T19:51:03.9704602Z [SETUP.PY] Setting the FBGEMM build target: genai ... 2025-05-07T19:51:03.9704994Z [SETUP.PY] Setting the FBGEMM build variant: cuda ... 2025-05-07T19:51:03.9711192Z [SETUP.PY] Passing CMake arguments: ['-DCMAKE_PREFIX_PATH=/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch', '-D_GLIBCXX_USE_CXX11_ABI=1', '-DCMAKE_VERBOSE_MAKEFILE=ON', '-DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE', '-DFBGEMM_BUILD_TARGET=genai', '-DFBGEMM_BUILD_VARIANT=cuda', '-DNVML_LIB_PATH=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so', '-DNCCL_INCLUDE_DIRS=/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include', '-DNCCL_LIBRARIES=/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib/libnccl.so.2', '-DCMAKE_C_COMPILER=/github/home/miniconda/envs/build_binary/bin/cc', '-DCMAKE_CXX_COMPILER=/github/home/miniconda/envs/build_binary/bin/c++', "-DCMAKE_C_FLAGS='-DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include'", "-DCMAKE_CXX_FLAGS='-DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include'", '-DTORCH_CUDA_ARCH_LIST=7.0;8.0;9.0;9.0a;10.0a;12.0a', '-DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCMAKE_CXX_STANDARD=20'] 2025-05-07T19:51:03.9717083Z 2025-05-07T19:51:03.9717087Z 2025-05-07T19:51:03.9717251Z -------------------------------------------------------------------------------- 2025-05-07T19:51:03.9717771Z -- Trying 'Ninja' generator 2025-05-07T19:51:03.9718029Z -------------------------------- 2025-05-07T19:51:03.9718307Z --------------------------- 2025-05-07T19:51:03.9718541Z ---------------------- 2025-05-07T19:51:03.9718782Z ----------------- 2025-05-07T19:51:03.9718987Z ------------ 2025-05-07T19:51:03.9719201Z ------- 2025-05-07T19:51:03.9719381Z -- 2025-05-07T19:51:04.0147920Z CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required): 2025-05-07T19:51:04.0149568Z Not searching for unused variables given on the command line. 2025-05-07T19:51:04.0151141Z Compatibility with CMake < 3.10 will be removed from a future version of 2025-05-07T19:51:04.0152373Z CMake. 2025-05-07T19:51:04.0152727Z 2025-05-07T19:51:04.0152940Z Update the VERSION argument value. Or, use the ... syntax 2025-05-07T19:51:04.0153480Z to tell CMake that the project requires at least but has been updated 2025-05-07T19:51:04.0153963Z to work with policies introduced by or earlier. 2025-05-07T19:51:04.0154202Z 2025-05-07T19:51:04.0154206Z 2025-05-07T19:51:04.0989085Z -- The C compiler identification is Clang 16.0.6 2025-05-07T19:51:04.1076229Z -- Detecting C compiler ABI info 2025-05-07T19:51:04.2347386Z -- Detecting C compiler ABI info - done 2025-05-07T19:51:04.2480584Z -- Check for working C compiler: /github/home/miniconda/envs/build_binary/bin/cc - skipped 2025-05-07T19:51:04.2483282Z -- Detecting C compile features 2025-05-07T19:51:04.2486002Z -- Detecting C compile features - done 2025-05-07T19:51:04.3907156Z -- The CXX compiler identification is Clang 16.0.6 2025-05-07T19:51:04.3978891Z -- Detecting CXX compiler ABI info 2025-05-07T19:51:04.5311156Z -- Detecting CXX compiler ABI info - done 2025-05-07T19:51:04.5449278Z -- Check for working CXX compiler: /github/home/miniconda/envs/build_binary/bin/c++ - skipped 2025-05-07T19:51:04.5450879Z -- Detecting CXX compile features 2025-05-07T19:51:04.5457521Z -- Detecting CXX compile features - done 2025-05-07T19:51:04.5471780Z -- Configuring done (0.6s) 2025-05-07T19:51:04.5523599Z -- Generating done (0.0s) 2025-05-07T19:51:04.5536919Z -- Build files have been written to: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_cmake_test_compile/build 2025-05-07T19:51:04.5580205Z -- 2025-05-07T19:51:04.5580854Z ------- 2025-05-07T19:51:04.5581396Z ------------ 2025-05-07T19:51:04.5581978Z ----------------- 2025-05-07T19:51:04.5582462Z ---------------------- 2025-05-07T19:51:04.5582849Z --------------------------- 2025-05-07T19:51:04.5583095Z -------------------------------- 2025-05-07T19:51:04.5583390Z -- Trying 'Ninja' generator - success 2025-05-07T19:51:04.5584001Z -------------------------------------------------------------------------------- 2025-05-07T19:51:04.5584314Z 2025-05-07T19:51:04.5604090Z Configuring Project 2025-05-07T19:51:04.5604886Z Working directory: 2025-05-07T19:51:04.5605940Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-build 2025-05-07T19:51:04.5607130Z Command: 2025-05-07T19:51:04.5628626Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/cmake/data/bin/cmake /__w/FBGEMM/FBGEMM/fbgemm_gpu -G Ninja -DCMAKE_MAKE_PROGRAM:FILEPATH=/github/home/miniconda/envs/build_binary/bin/ninja --no-warn-unused-cli -DCMAKE_INSTALL_PREFIX:PATH=/__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install -DPYTHON_VERSION_STRING:STRING=3.9.22 -DSKBUILD:INTERNAL=TRUE -DCMAKE_MODULE_PATH:PATH=/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/skbuild/resources/cmake -DPYTHON_EXECUTABLE:PATH=/github/home/miniconda/envs/build_binary/bin/python -DPYTHON_INCLUDE_DIR:PATH=/github/home/miniconda/envs/build_binary/include/python3.9 -DPYTHON_LIBRARY:PATH=/github/home/miniconda/envs/build_binary/lib/libpython3.9.so -DPython_EXECUTABLE:PATH=/github/home/miniconda/envs/build_binary/bin/python -DPython_ROOT_DIR:PATH=/github/home/miniconda/envs/build_binary -DPython_FIND_REGISTRY:STRING=NEVER -DPython_INCLUDE_DIR:PATH=/github/home/miniconda/envs/build_binary/include/python3.9 -DPython_NumPy_INCLUDE_DIRS:PATH=/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/numpy/_core/include -DPython3_EXECUTABLE:PATH=/github/home/miniconda/envs/build_binary/bin/python -DPython3_ROOT_DIR:PATH=/github/home/miniconda/envs/build_binary -DPython3_FIND_REGISTRY:STRING=NEVER -DPython3_INCLUDE_DIR:PATH=/github/home/miniconda/envs/build_binary/include/python3.9 -DPython3_NumPy_INCLUDE_DIRS:PATH=/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/numpy/_core/include -DCMAKE_MAKE_PROGRAM:FILEPATH=/github/home/miniconda/envs/build_binary/bin/ninja -DCMAKE_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ar -DCMAKE_CXX_COMPILER_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_C_COMPILER_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ranlib -DCMAKE_CXX_COMPILER_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_C_COMPILER_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_LINKER=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ld -DCMAKE_STRIP=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-strip -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH=/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch -D_GLIBCXX_USE_CXX11_ABI=1 -DCMAKE_VERBOSE_MAKEFILE=ON -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DFBGEMM_BUILD_TARGET=genai -DFBGEMM_BUILD_VARIANT=cuda -DNVML_LIB_PATH=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so -DNCCL_INCLUDE_DIRS=/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -DNCCL_LIBRARIES=/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib/libnccl.so.2 -DCMAKE_C_COMPILER=/github/home/miniconda/envs/build_binary/bin/cc -DCMAKE_CXX_COMPILER=/github/home/miniconda/envs/build_binary/bin/c++ '-DCMAKE_C_FLAGS='"'"'-DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include'"'"'' '-DCMAKE_CXX_FLAGS='"'"'-DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include'"'"'' '-DTORCH_CUDA_ARCH_LIST=7.0;8.0;9.0;9.0a;10.0a;12.0a' -DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCMAKE_CXX_STANDARD=20 '-DTORCH_CUDA_ARCH_LIST=7.0;8.0;9.0;9.0a;10.0a;12.0a' -DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCMAKE_CXX_STANDARD=20 -DCMAKE_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ar -DCMAKE_CXX_COMPILER_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_C_COMPILER_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ranlib -DCMAKE_CXX_COMPILER_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_C_COMPILER_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_LINKER=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ld -DCMAKE_STRIP=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-strip -DCMAKE_BUILD_TYPE=Release 2025-05-07T19:51:04.5648136Z 2025-05-07T19:51:04.6099244Z 2025-05-07T19:51:04.6100018Z Not searching for unused variables given on the command line. 2025-05-07T19:51:04.6101057Z 2025-05-07T19:51:04.6101201Z ================================================================================ 2025-05-07T19:51:04.6101906Z Default C compiler flags 2025-05-07T19:51:04.6102290Z (values may be overridden by CMAKE_CXX_STANDARD and CXX_STANDARD): 2025-05-07T19:51:04.6102693Z 2025-05-07T19:51:04.6103523Z -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include 2025-05-07T19:51:04.6104509Z ================================================================================ 2025-05-07T19:51:04.6104749Z 2025-05-07T19:51:04.6104753Z 2025-05-07T19:51:04.6104756Z 2025-05-07T19:51:04.6104871Z ================================================================================ 2025-05-07T19:51:04.6105184Z Default C++ compiler flags 2025-05-07T19:51:04.6105541Z (values may be overridden by CMAKE_CXX_STANDARD and CXX_STANDARD): 2025-05-07T19:51:04.6105826Z 2025-05-07T19:51:04.6106629Z -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include 2025-05-07T19:51:04.6107616Z ================================================================================ 2025-05-07T19:51:04.6107860Z 2025-05-07T19:51:04.6107863Z 2025-05-07T19:51:04.6107867Z 2025-05-07T19:51:04.6107977Z ================================================================================ 2025-05-07T19:51:04.6108293Z AVX2_FLAGS: 2025-05-07T19:51:04.6108408Z 2025-05-07T19:51:04.6108483Z -mavx2 2025-05-07T19:51:04.6108677Z -mf16c 2025-05-07T19:51:04.6108849Z -mfma 2025-05-07T19:51:04.6109047Z -fopenmp 2025-05-07T19:51:04.6109258Z ================================================================================ 2025-05-07T19:51:04.6109491Z 2025-05-07T19:51:04.6109495Z 2025-05-07T19:51:04.6109499Z 2025-05-07T19:51:04.6109605Z ================================================================================ 2025-05-07T19:51:04.6109921Z AVX512_FLAGS: 2025-05-07T19:51:04.6110038Z 2025-05-07T19:51:04.6110113Z -mavx2 2025-05-07T19:51:04.6110303Z -mf16c 2025-05-07T19:51:04.6110478Z -mfma 2025-05-07T19:51:04.6110670Z -mavx512f 2025-05-07T19:51:04.6110854Z -mavx512bw 2025-05-07T19:51:04.6111055Z -mavx512dq 2025-05-07T19:51:04.6111239Z -mavx512vl 2025-05-07T19:51:04.6111443Z -fopenmp 2025-05-07T19:51:04.6111650Z ================================================================================ 2025-05-07T19:51:04.6111886Z 2025-05-07T19:51:04.6111890Z 2025-05-07T19:51:04.6111893Z 2025-05-07T19:51:04.6112005Z ================================================================================ 2025-05-07T19:51:04.6112470Z The project is built using scikit-build 2025-05-07T19:51:04.6112782Z ================================================================================ 2025-05-07T19:51:04.6113023Z 2025-05-07T19:51:04.6113027Z 2025-05-07T19:51:04.6113030Z 2025-05-07T19:51:04.6113136Z ================================================================================ 2025-05-07T19:51:04.6113432Z Build Settings 2025-05-07T19:51:04.6113572Z 2025-05-07T19:51:04.6113670Z FBGEMM_BUILD_TARGET : genai 2025-05-07T19:51:04.6113949Z FBGEMM_BUILD_VARIANT : cuda 2025-05-07T19:51:04.6114118Z 2025-05-07T19:51:04.6114208Z NVCC_VERBOSE : 2025-05-07T19:51:04.6114463Z CUDNN_INCLUDE_DIR : 2025-05-07T19:51:04.6114699Z CUDNN_LIBRARY : 2025-05-07T19:51:04.6115126Z NVML_LIB_PATH : /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:51:04.6115581Z TORCH_CUDA_ARCH_LIST : 7.0 2025-05-07T19:51:04.6115832Z 8.0 2025-05-07T19:51:04.6116006Z 9.0 2025-05-07T19:51:04.6116198Z 9.0a 2025-05-07T19:51:04.6116388Z 10.0a 2025-05-07T19:51:04.6116563Z 12.0a 2025-05-07T19:51:04.6116665Z 2025-05-07T19:51:04.6116772Z HIP_ROOT_DIR : 2025-05-07T19:51:04.6117011Z HIPCC_VERBOSE : 2025-05-07T19:51:04.6117265Z AMDGPU_TARGETS : 2025-05-07T19:51:04.6117501Z PYTORCH_ROCM_ARCH : 2025-05-07T19:51:04.6117874Z ================================================================================ 2025-05-07T19:51:04.6118093Z 2025-05-07T19:51:04.7608126Z -- The CXX compiler identification is Clang 16.0.6 2025-05-07T19:51:04.8325524Z -- The C compiler identification is Clang 16.0.6 2025-05-07T19:51:05.9060688Z -- The CUDA compiler identification is NVIDIA 12.8.61 with host compiler Clang 16.0.6 2025-05-07T19:51:05.9170757Z -- Detecting CXX compiler ABI info 2025-05-07T19:51:06.0480207Z -- Detecting CXX compiler ABI info - done 2025-05-07T19:51:06.0616856Z -- Check for working CXX compiler: /github/home/miniconda/envs/build_binary/bin/c++ - skipped 2025-05-07T19:51:06.0618753Z -- Detecting CXX compile features 2025-05-07T19:51:06.0623975Z -- Detecting CXX compile features - done 2025-05-07T19:51:06.0698988Z -- Detecting C compiler ABI info 2025-05-07T19:51:06.1939636Z -- Detecting C compiler ABI info - done 2025-05-07T19:51:06.2069477Z -- Check for working C compiler: /github/home/miniconda/envs/build_binary/bin/cc - skipped 2025-05-07T19:51:06.2072402Z -- Detecting C compile features 2025-05-07T19:51:06.2078195Z -- Detecting C compile features - done 2025-05-07T19:51:06.2127544Z -- Detecting CUDA compiler ABI info 2025-05-07T19:51:07.2373942Z -- Detecting CUDA compiler ABI info - done 2025-05-07T19:51:07.2911955Z -- Check for working CUDA compiler: /github/home/miniconda/envs/build_binary/bin/nvcc - skipped 2025-05-07T19:51:07.2937514Z -- Detecting CUDA compile features 2025-05-07T19:51:07.2938584Z -- Detecting CUDA compile features - done 2025-05-07T19:51:07.2962240Z -- Performing Test C_HAS_AVX_1 2025-05-07T19:51:07.5859659Z -- Performing Test C_HAS_AVX_1 - Failed 2025-05-07T19:51:07.5860692Z -- Performing Test C_HAS_AVX_2 2025-05-07T19:51:07.9200124Z -- Performing Test C_HAS_AVX_2 - Success 2025-05-07T19:51:07.9201144Z -- Performing Test C_HAS_AVX2_1 2025-05-07T19:51:08.2078022Z -- Performing Test C_HAS_AVX2_1 - Failed 2025-05-07T19:51:08.2079043Z -- Performing Test C_HAS_AVX2_2 2025-05-07T19:51:08.5376066Z -- Performing Test C_HAS_AVX2_2 - Success 2025-05-07T19:51:08.5377307Z -- Performing Test C_HAS_AVX512_1 2025-05-07T19:51:08.8256204Z -- Performing Test C_HAS_AVX512_1 - Failed 2025-05-07T19:51:08.8257460Z -- Performing Test C_HAS_AVX512_2 2025-05-07T19:51:09.1570342Z -- Performing Test C_HAS_AVX512_2 - Success 2025-05-07T19:51:09.1571049Z -- Performing Test CXX_HAS_AVX_1 2025-05-07T19:51:09.4470827Z -- Performing Test CXX_HAS_AVX_1 - Failed 2025-05-07T19:51:09.4471879Z -- Performing Test CXX_HAS_AVX_2 2025-05-07T19:51:09.7811323Z -- Performing Test CXX_HAS_AVX_2 - Success 2025-05-07T19:51:09.7812477Z -- Performing Test CXX_HAS_AVX2_1 2025-05-07T19:51:10.0691913Z -- Performing Test CXX_HAS_AVX2_1 - Failed 2025-05-07T19:51:10.0692321Z -- Performing Test CXX_HAS_AVX2_2 2025-05-07T19:51:10.4021042Z -- Performing Test CXX_HAS_AVX2_2 - Success 2025-05-07T19:51:10.4021872Z -- Performing Test CXX_HAS_AVX512_1 2025-05-07T19:51:10.6887482Z -- Performing Test CXX_HAS_AVX512_1 - Failed 2025-05-07T19:51:10.6888605Z -- Performing Test CXX_HAS_AVX512_2 2025-05-07T19:51:11.0228327Z -- Performing Test CXX_HAS_AVX512_2 - Success 2025-05-07T19:51:11.0409879Z -- Found CUDA: /github/home/miniconda/envs/build_binary/targets/x86_64-linux (found version "12.8") 2025-05-07T19:51:11.0446056Z -- Found CUDAToolkit: /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include (found version "12.8.61") 2025-05-07T19:51:11.0527840Z -- Performing Test CMAKE_HAVE_LIBC_PTHREAD 2025-05-07T19:51:11.1785847Z -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success 2025-05-07T19:51:11.1796038Z -- Found Threads: TRUE 2025-05-07T19:51:11.3449148Z -- PyTorch: CUDA detected: 12.8 2025-05-07T19:51:11.3450753Z -- PyTorch: CUDA nvcc is: /github/home/miniconda/envs/build_binary/targets/x86_64-linux/bin/nvcc 2025-05-07T19:51:11.3452366Z -- PyTorch: CUDA toolkit directory: /github/home/miniconda/envs/build_binary/targets/x86_64-linux 2025-05-07T19:51:11.5122346Z -- PyTorch: Header version is: 12.8 2025-05-07T19:51:11.6923764Z -- Found Python: /github/home/miniconda/envs/build_binary/bin/python (found version "3.9.22") found components: Interpreter 2025-05-07T19:51:11.6936622Z CMake Warning at /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:140 (message): 2025-05-07T19:51:11.6939312Z Failed to compute shorthash for libnvrtc.so 2025-05-07T19:51:11.6940322Z Call Stack (most recent call first): 2025-05-07T19:51:11.6942409Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:86 (include) 2025-05-07T19:51:11.6943596Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) 2025-05-07T19:51:11.6944622Z /__w/FBGEMM/FBGEMM/cmake/modules/PyTorchSetup.cmake:14 (find_package) 2025-05-07T19:51:11.6945079Z CMakeLists.txt:112 (include) 2025-05-07T19:51:11.6945407Z 2025-05-07T19:51:11.6945412Z 2025-05-07T19:51:11.6945571Z -- USE_CUDNN is set to 0. Compiling without cuDNN support 2025-05-07T19:51:11.6946036Z -- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support 2025-05-07T19:51:11.6946508Z -- USE_CUDSS is set to 0. Compiling without cuDSS support 2025-05-07T19:51:11.6946942Z -- USE_CUFILE is set to 0. Compiling without cuFile support 2025-05-07T19:51:11.6948024Z -- Added CUDA NVCC flags for: -gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_90a,code=sm_90a;-gencode;arch=compute_100a,code=sm_100a;-gencode;arch=compute_120a,code=sm_120a 2025-05-07T19:51:11.7289489Z CMake Warning at /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message): 2025-05-07T19:51:11.7290402Z static library kineto_LIBRARY-NOTFOUND not found. 2025-05-07T19:51:11.7290786Z Call Stack (most recent call first): 2025-05-07T19:51:11.7291614Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:125 (append_torchlib_if_found) 2025-05-07T19:51:11.7292856Z /__w/FBGEMM/FBGEMM/cmake/modules/PyTorchSetup.cmake:14 (find_package) 2025-05-07T19:51:11.7293646Z CMakeLists.txt:112 (include) 2025-05-07T19:51:11.7293840Z 2025-05-07T19:51:11.7293845Z 2025-05-07T19:51:11.7293861Z 2025-05-07T19:51:11.7293866Z 2025-05-07T19:51:11.7294278Z -- Found Torch: /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libtorch.so 2025-05-07T19:51:11.7294917Z ================================================================================ 2025-05-07T19:51:11.7295288Z PyTorch Flags: 2025-05-07T19:51:11.7295518Z 2025-05-07T19:51:11.7295762Z TORCH_INCLUDE_DIRS: 2025-05-07T19:51:11.7296486Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include 2025-05-07T19:51:11.7297442Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include 2025-05-07T19:51:11.7298148Z 2025-05-07T19:51:11.7298364Z TORCH_LIBRARIES: 2025-05-07T19:51:11.7298635Z torch 2025-05-07T19:51:11.7298849Z torch_library 2025-05-07T19:51:11.7299316Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libc10.so 2025-05-07T19:51:11.7300008Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so 2025-05-07T19:51:11.7300745Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libc10_cuda.so 2025-05-07T19:51:11.7301308Z 2025-05-07T19:51:11.7301521Z TORCH_CUDA_OPTIONS: 2025-05-07T19:51:11.7301806Z --expt-relaxed-constexpr 2025-05-07T19:51:11.7302093Z -D__CUDA_NO_HALF_OPERATORS__ 2025-05-07T19:51:11.7302417Z -D__CUDA_NO_BFLOAT16_CONVERSIONS__ 2025-05-07T19:51:11.7302734Z -D__CUDA_NO_HALF2_OPERATORS__ 2025-05-07T19:51:11.7303103Z ================================================================================ 2025-05-07T19:51:11.7303353Z 2025-05-07T19:51:11.7303357Z 2025-05-07T19:51:11.7303362Z 2025-05-07T19:51:11.7303513Z ================================================================================ 2025-05-07T19:51:11.7303966Z NCCL Flags 2025-05-07T19:51:11.7304128Z 2025-05-07T19:51:11.7304515Z NCCL_INCLUDE_DIRS=/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include 2025-05-07T19:51:11.7305435Z NCCL_LIBRARIES=/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib/libnccl.so.2 2025-05-07T19:51:11.7306125Z ================================================================================ 2025-05-07T19:51:11.7306368Z 2025-05-07T19:51:11.7306373Z 2025-05-07T19:51:11.7306377Z 2025-05-07T19:51:11.7306535Z ================================================================================ 2025-05-07T19:51:11.7306880Z CUDA Driver Path 2025-05-07T19:51:11.7307054Z 2025-05-07T19:51:11.7307429Z CUDA_DRIVER_LIBRARIES=/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:51:11.7308027Z ================================================================================ 2025-05-07T19:51:11.7308297Z 2025-05-07T19:51:11.7308602Z -- Found NVML_LIB_PATH: /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:51:11.7325591Z 2025-05-07T19:51:11.7325697Z 2025-05-07T19:51:11.7325942Z ================================================================================ 2025-05-07T19:51:11.7326363Z GPU CPP Library Target: asmjit (SHARED) 2025-05-07T19:51:11.7326923Z 2025-05-07T19:51:11.7327197Z CPU_SRCS: 2025-05-07T19:51:11.7327321Z 2025-05-07T19:51:11.7327407Z 2025-05-07T19:51:11.7327649Z GPU_SRCS: 2025-05-07T19:51:11.7327770Z 2025-05-07T19:51:11.7327857Z 2025-05-07T19:51:11.7328100Z CUDA_SPECIFIC_SRCS: 2025-05-07T19:51:11.7328254Z 2025-05-07T19:51:11.7328365Z 2025-05-07T19:51:11.7328600Z HIP_SPECIFIC_SRCS: 2025-05-07T19:51:11.7328753Z 2025-05-07T19:51:11.7328871Z 2025-05-07T19:51:11.7329071Z OTHER_SRCS: 2025-05-07T19:51:11.7329500Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64assembler.cpp 2025-05-07T19:51:11.7330151Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64builder.cpp 2025-05-07T19:51:11.7330816Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64compiler.cpp 2025-05-07T19:51:11.7331452Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64emithelper.cpp 2025-05-07T19:51:11.7332116Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64formatter.cpp 2025-05-07T19:51:11.7332872Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64func.cpp 2025-05-07T19:51:11.7333466Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64instapi.cpp 2025-05-07T19:51:11.7334275Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64instdb.cpp 2025-05-07T19:51:11.7334891Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64operand.cpp 2025-05-07T19:51:11.7335520Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64rapass.cpp 2025-05-07T19:51:11.7336128Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/armformatter.cpp 2025-05-07T19:51:11.7336781Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/archtraits.cpp 2025-05-07T19:51:11.7337742Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/assembler.cpp 2025-05-07T19:51:11.7338453Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/builder.cpp 2025-05-07T19:51:11.7339100Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/codeholder.cpp 2025-05-07T19:51:11.7339738Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/codewriter.cpp 2025-05-07T19:51:11.7340389Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/compiler.cpp 2025-05-07T19:51:11.7341041Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/constpool.cpp 2025-05-07T19:51:11.7341648Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/cpuinfo.cpp 2025-05-07T19:51:11.7342284Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/emithelper.cpp 2025-05-07T19:51:11.7342888Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/emitter.cpp 2025-05-07T19:51:11.7343613Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/emitterutils.cpp 2025-05-07T19:51:11.7344257Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/environment.cpp 2025-05-07T19:51:11.7344917Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/errorhandler.cpp 2025-05-07T19:51:11.7345574Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/formatter.cpp 2025-05-07T19:51:11.7346166Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/func.cpp 2025-05-07T19:51:11.7346825Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/funcargscontext.cpp 2025-05-07T19:51:11.7347462Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/globals.cpp 2025-05-07T19:51:11.7348071Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/inst.cpp 2025-05-07T19:51:11.7348654Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/instdb.cpp 2025-05-07T19:51:11.7349293Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/jitallocator.cpp 2025-05-07T19:51:11.7350024Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/jitruntime.cpp 2025-05-07T19:51:11.7350769Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/logger.cpp 2025-05-07T19:51:11.7351345Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/operand.cpp 2025-05-07T19:51:11.7351950Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/osutils.cpp 2025-05-07T19:51:11.7352531Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/ralocal.cpp 2025-05-07T19:51:11.7353142Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/rapass.cpp 2025-05-07T19:51:11.7353744Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/rastack.cpp 2025-05-07T19:51:11.7354319Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/string.cpp 2025-05-07T19:51:11.7354921Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/support.cpp 2025-05-07T19:51:11.7355500Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/target.cpp 2025-05-07T19:51:11.7356091Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/type.cpp 2025-05-07T19:51:11.7356682Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/virtmem.cpp 2025-05-07T19:51:11.7357251Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zone.cpp 2025-05-07T19:51:11.7357851Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zonehash.cpp 2025-05-07T19:51:11.7358446Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zonelist.cpp 2025-05-07T19:51:11.7359173Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zonestack.cpp 2025-05-07T19:51:11.7359786Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zonetree.cpp 2025-05-07T19:51:11.7360420Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zonevector.cpp 2025-05-07T19:51:11.7361064Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86assembler.cpp 2025-05-07T19:51:11.7361672Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86builder.cpp 2025-05-07T19:51:11.7362301Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86compiler.cpp 2025-05-07T19:51:11.7362915Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86emithelper.cpp 2025-05-07T19:51:11.7363560Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86formatter.cpp 2025-05-07T19:51:11.7364154Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86func.cpp 2025-05-07T19:51:11.7364770Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86instapi.cpp 2025-05-07T19:51:11.7365385Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86instdb.cpp 2025-05-07T19:51:11.7365975Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86operand.cpp 2025-05-07T19:51:11.7366588Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86rapass.cpp 2025-05-07T19:51:11.7367093Z 2025-05-07T19:51:11.7367538Z CC_FLAGS: 2025-05-07T19:51:11.7367843Z 2025-05-07T19:51:11.7367933Z 2025-05-07T19:51:11.7368210Z NVCC_FLAGS: 2025-05-07T19:51:11.7368344Z 2025-05-07T19:51:11.7368474Z 2025-05-07T19:51:11.7368687Z HIPCC_FLAGS: 2025-05-07T19:51:11.7368827Z 2025-05-07T19:51:11.7368946Z 2025-05-07T19:51:11.7369146Z INCLUDE_DIRS: 2025-05-07T19:51:11.7369426Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:51:11.7369760Z /__w/FBGEMM/FBGEMM/fbgemm_gpu 2025-05-07T19:51:11.7370081Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include 2025-05-07T19:51:11.7370410Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:51:11.7370956Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include 2025-05-07T19:51:11.7371791Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include 2025-05-07T19:51:11.7372456Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src 2025-05-07T19:51:11.7372917Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include 2025-05-07T19:51:11.7373371Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include 2025-05-07T19:51:11.7374002Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include 2025-05-07T19:51:11.7374564Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include 2025-05-07T19:51:11.7375041Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include 2025-05-07T19:51:11.7375631Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include 2025-05-07T19:51:11.7376141Z 2025-05-07T19:51:11.7376382Z Selected Source Files: 2025-05-07T19:51:11.7376549Z 2025-05-07T19:51:11.7376645Z 2025-05-07T19:51:11.7376891Z HIPified Source Files: 2025-05-07T19:51:11.7377126Z 2025-05-07T19:51:11.7377216Z 2025-05-07T19:51:11.7377648Z Library Dependencies: 2025-05-07T19:51:11.7377897Z torch 2025-05-07T19:51:11.7378185Z torch_library 2025-05-07T19:51:11.7378676Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libc10.so 2025-05-07T19:51:11.7379376Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so 2025-05-07T19:51:11.7380126Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libc10_cuda.so 2025-05-07T19:51:11.7380945Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib/libnccl.so.2 2025-05-07T19:51:11.7381737Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:51:11.7382387Z /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:51:11.7382816Z 2025-05-07T19:51:11.7383069Z Output Library: 2025-05-07T19:51:11.7383462Z asmjit 2025-05-07T19:51:11.7383704Z 2025-05-07T19:51:11.7383913Z Destination Directory: 2025-05-07T19:51:11.7384191Z fbgemm_gpu 2025-05-07T19:51:11.7384439Z ================================================================================ 2025-05-07T19:51:11.7384711Z 2025-05-07T19:51:11.7384715Z 2025-05-07T19:51:11.7384723Z 2025-05-07T19:51:11.7384844Z ================================================================================ 2025-05-07T19:51:11.7385233Z GPU CPP Library Target: fbgemm (SHARED) 2025-05-07T19:51:11.7385535Z 2025-05-07T19:51:11.7385764Z CPU_SRCS: 2025-05-07T19:51:11.7385884Z 2025-05-07T19:51:11.7385973Z 2025-05-07T19:51:11.7386197Z GPU_SRCS: 2025-05-07T19:51:11.7386319Z 2025-05-07T19:51:11.7386409Z 2025-05-07T19:51:11.7386641Z CUDA_SPECIFIC_SRCS: 2025-05-07T19:51:11.7386791Z 2025-05-07T19:51:11.7386877Z 2025-05-07T19:51:11.7387117Z HIP_SPECIFIC_SRCS: 2025-05-07T19:51:11.7387269Z 2025-05-07T19:51:11.7387356Z 2025-05-07T19:51:11.7387592Z OTHER_SRCS: 2025-05-07T19:51:11.7387908Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDM.cc 2025-05-07T19:51:11.7388368Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDMAutovec.cc 2025-05-07T19:51:11.7388872Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDMNBit.cc 2025-05-07T19:51:11.7389306Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/QuantUtils.cc 2025-05-07T19:51:11.7389967Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/RefImplementations.cc 2025-05-07T19:51:11.7390433Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/RowWiseSparseAdagradFused.cc 2025-05-07T19:51:11.7390914Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/SparseAdagrad.cc 2025-05-07T19:51:11.7391285Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/Utils.cc 2025-05-07T19:51:11.7391703Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDMAvx2.cc 2025-05-07T19:51:11.7392151Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/QuantUtilsAvx2.cc 2025-05-07T19:51:11.7392570Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDMAvx2.cc 2025-05-07T19:51:11.7393008Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/QuantUtilsAvx2.cc 2025-05-07T19:51:11.7393435Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDMAvx512.cc 2025-05-07T19:51:11.7393820Z 2025-05-07T19:51:11.7394011Z CC_FLAGS: 2025-05-07T19:51:11.7394156Z 2025-05-07T19:51:11.7394239Z 2025-05-07T19:51:11.7394431Z NVCC_FLAGS: 2025-05-07T19:51:11.7394576Z 2025-05-07T19:51:11.7394657Z 2025-05-07T19:51:11.7394870Z HIPCC_FLAGS: 2025-05-07T19:51:11.7394994Z 2025-05-07T19:51:11.7395080Z 2025-05-07T19:51:11.7395300Z INCLUDE_DIRS: 2025-05-07T19:51:11.7395526Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:51:11.7395847Z /__w/FBGEMM/FBGEMM/fbgemm_gpu 2025-05-07T19:51:11.7396122Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include 2025-05-07T19:51:11.7396434Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:51:11.7396897Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include 2025-05-07T19:51:11.7397655Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include 2025-05-07T19:51:11.7398294Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src 2025-05-07T19:51:11.7398692Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include 2025-05-07T19:51:11.7399140Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include 2025-05-07T19:51:11.7399592Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include 2025-05-07T19:51:11.7400120Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include 2025-05-07T19:51:11.7400565Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include 2025-05-07T19:51:11.7401116Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include 2025-05-07T19:51:11.7401621Z 2025-05-07T19:51:11.7401820Z Selected Source Files: 2025-05-07T19:51:11.7401974Z 2025-05-07T19:51:11.7402079Z 2025-05-07T19:51:11.7402282Z HIPified Source Files: 2025-05-07T19:51:11.7402455Z 2025-05-07T19:51:11.7402535Z 2025-05-07T19:51:11.7402731Z Library Dependencies: 2025-05-07T19:51:11.7402983Z torch 2025-05-07T19:51:11.7403175Z torch_library 2025-05-07T19:51:11.7403694Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libc10.so 2025-05-07T19:51:11.7404370Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so 2025-05-07T19:51:11.7405030Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libc10_cuda.so 2025-05-07T19:51:11.7405807Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib/libnccl.so.2 2025-05-07T19:51:11.7406518Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:51:11.7407003Z asmjit 2025-05-07T19:51:11.7407324Z /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:51:11.7407739Z 2025-05-07T19:51:11.7407940Z Output Library: 2025-05-07T19:51:11.7408177Z fbgemm 2025-05-07T19:51:11.7408391Z 2025-05-07T19:51:11.7408595Z Destination Directory: 2025-05-07T19:51:11.7408854Z fbgemm_gpu 2025-05-07T19:51:11.7409096Z ================================================================================ 2025-05-07T19:51:11.7409326Z 2025-05-07T19:51:11.7409357Z 2025-05-07T19:51:11.7409360Z 2025-05-07T19:51:11.7409479Z ================================================================================ 2025-05-07T19:51:11.7409812Z Running code generation script ... 2025-05-07T19:51:11.7410719Z /github/home/miniconda/envs/build_binary/bin/python /__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_backward_split.py --opensource 2025-05-07T19:51:11.7411547Z ================================================================================ 2025-05-07T19:51:11.7411774Z 2025-05-07T19:51:12.3699754Z [ARGS PARSE] Parsed arguments: Namespace(install_dir='.', is_fbcode=False, is_rocm=False) 2025-05-07T19:51:12.3701835Z [GENERAATE BACKWARD SPLIT]: ['/__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_backward_split.py', '--opensource'] 2025-05-07T19:51:12.3702720Z Written: gen_embedding_backward_dense_split_weighted_vbe_cuda.cu 2025-05-07T19:51:12.3703204Z Written: gen_embedding_backward_dense_split_weighted_cuda.cu 2025-05-07T19:51:12.3703707Z Written: gen_embedding_backward_dense_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:12.3704213Z Written: gen_embedding_backward_dense_split_unweighted_vbe_cuda.cu 2025-05-07T19:51:12.3795997Z Written: gen_embedding_backward_dense_split_unweighted_cuda.cu 2025-05-07T19:51:12.3796506Z Written: gen_embedding_backward_dense_split_weighted_vbe_meta.cpp 2025-05-07T19:51:12.3796996Z Written: gen_embedding_backward_dense_split_weighted_meta.cpp 2025-05-07T19:51:12.3797487Z Written: gen_embedding_backward_dense_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:12.3798186Z Written: gen_embedding_backward_dense_split_unweighted_vbe_meta.cpp 2025-05-07T19:51:12.3798692Z Written: gen_embedding_backward_dense_split_unweighted_meta.cpp 2025-05-07T19:51:12.3799221Z Written: gen_embedding_backward_dense_split_weighted_vbe_kernel_cta.cu 2025-05-07T19:51:12.3799744Z Written: gen_embedding_backward_dense_split_weighted_kernel_cta.cu 2025-05-07T19:51:12.3800302Z Written: gen_embedding_backward_dense_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:12.3800865Z Written: gen_embedding_backward_dense_split_unweighted_vbe_kernel_cta.cu 2025-05-07T19:51:12.3801420Z Written: gen_embedding_backward_dense_split_unweighted_kernel_cta.cu 2025-05-07T19:51:12.3801967Z Written: gen_embedding_backward_dense_split_weighted_vbe_kernel_warp.cu 2025-05-07T19:51:12.3802504Z Written: gen_embedding_backward_dense_split_weighted_kernel_warp.cu 2025-05-07T19:51:12.3803057Z Written: gen_embedding_backward_dense_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:12.3803624Z Written: gen_embedding_backward_dense_split_unweighted_vbe_kernel_warp.cu 2025-05-07T19:51:12.3804185Z Written: gen_embedding_backward_dense_split_unweighted_kernel_warp.cu 2025-05-07T19:51:12.3804694Z Written: gen_embedding_optimizer_dense_split_device_kernel.cuh 2025-05-07T19:51:12.3805139Z Written: gen_embedding_backward_split_dense.cpp 2025-05-07T19:51:12.3805539Z Written: gen_embedding_backward_dense_split_cpu.cpp 2025-05-07T19:51:12.3806187Z Written: gen_embedding_backward_adagrad_split_weighted_cuda.cu 2025-05-07T19:51:12.3806715Z Written: gen_embedding_backward_adagrad_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:12.3807226Z Written: gen_embedding_backward_adagrad_split_unweighted_cuda.cu 2025-05-07T19:51:12.3807729Z Written: gen_embedding_backward_adagrad_split_weighted_meta.cpp 2025-05-07T19:51:12.3808246Z Written: gen_embedding_backward_adagrad_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:12.3808778Z Written: gen_embedding_backward_adagrad_split_unweighted_meta.cpp 2025-05-07T19:51:12.3809300Z Written: gen_embedding_backward_adagrad_split_weighted_kernel_cta.cu 2025-05-07T19:51:12.3809844Z Written: gen_embedding_backward_adagrad_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:12.3810421Z Written: gen_embedding_backward_adagrad_split_unweighted_kernel_cta.cu 2025-05-07T19:51:12.3810945Z Written: gen_embedding_backward_adagrad_split_weighted_kernel_warp.cu 2025-05-07T19:51:12.3811622Z Written: gen_embedding_backward_adagrad_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:12.3812162Z Written: gen_embedding_backward_adagrad_split_unweighted_kernel_warp.cu 2025-05-07T19:51:12.3812667Z Written: gen_embedding_optimizer_adagrad_split_device_kernel.cuh 2025-05-07T19:51:12.3813095Z Written: gen_embedding_backward_split_adagrad.cpp 2025-05-07T19:51:12.3813572Z Written: gen_embedding_split_adagrad_pt2_autograd.cpp 2025-05-07T19:51:12.3814020Z Written: gen_embedding_backward_split_adagrad_pt2_cuda_wrapper.cpp 2025-05-07T19:51:12.3814405Z Written: lookup_adagrad.py 2025-05-07T19:51:12.3814728Z Written: gen_embedding_backward_adagrad_split_cpu.cpp 2025-05-07T19:51:12.3815112Z Written: gen_embedding_backward_split_adagrad_cpu.cpp 2025-05-07T19:51:12.3815557Z Written: gen_embedding_backward_split_adagrad_pt2_cpu_wrapper.cpp 2025-05-07T19:51:12.3816023Z Written: gen_embedding_backward_adam_split_weighted_vbe_cuda.cu 2025-05-07T19:51:12.3816486Z Written: gen_embedding_backward_adam_split_weighted_cuda.cu 2025-05-07T19:51:12.3817070Z Written: gen_embedding_backward_adam_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:12.3817758Z Written: gen_embedding_backward_adam_split_unweighted_vbe_cuda.cu 2025-05-07T19:51:12.3818313Z Written: gen_embedding_backward_adam_split_unweighted_cuda.cu 2025-05-07T19:51:12.3818798Z Written: gen_embedding_backward_adam_split_weighted_vbe_meta.cpp 2025-05-07T19:51:12.3819303Z Written: gen_embedding_backward_adam_split_weighted_meta.cpp 2025-05-07T19:51:12.3819801Z Written: gen_embedding_backward_adam_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:12.3820346Z Written: gen_embedding_backward_adam_split_unweighted_vbe_meta.cpp 2025-05-07T19:51:12.3820863Z Written: gen_embedding_backward_adam_split_unweighted_meta.cpp 2025-05-07T19:51:12.3821364Z Written: gen_embedding_backward_adam_split_weighted_vbe_kernel_cta.cu 2025-05-07T19:51:12.3821901Z Written: gen_embedding_backward_adam_split_weighted_kernel_cta.cu 2025-05-07T19:51:12.3822435Z Written: gen_embedding_backward_adam_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:12.3823014Z Written: gen_embedding_backward_adam_split_unweighted_vbe_kernel_cta.cu 2025-05-07T19:51:12.3823553Z Written: gen_embedding_backward_adam_split_unweighted_kernel_cta.cu 2025-05-07T19:51:12.3824100Z Written: gen_embedding_backward_adam_split_weighted_vbe_kernel_warp.cu 2025-05-07T19:51:12.3824637Z Written: gen_embedding_backward_adam_split_weighted_kernel_warp.cu 2025-05-07T19:51:12.3825180Z Written: gen_embedding_backward_adam_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:12.3825774Z Written: gen_embedding_backward_adam_split_unweighted_vbe_kernel_warp.cu 2025-05-07T19:51:12.3826316Z Written: gen_embedding_backward_adam_split_unweighted_kernel_warp.cu 2025-05-07T19:51:12.3826836Z Written: gen_embedding_optimizer_adam_split_device_kernel.cuh 2025-05-07T19:51:12.3827262Z Written: gen_embedding_backward_split_adam.cpp 2025-05-07T19:51:12.3827661Z Written: gen_embedding_split_adam_pt2_autograd.cpp 2025-05-07T19:51:12.3828225Z Written: gen_embedding_backward_split_adam_pt2_cuda_wrapper.cpp 2025-05-07T19:51:12.3828623Z Written: lookup_adam.py 2025-05-07T19:51:12.3828941Z Written: gen_embedding_backward_split_adam_cpu.cpp 2025-05-07T19:51:12.3829378Z Written: gen_embedding_backward_split_adam_pt2_cpu_wrapper.cpp 2025-05-07T19:51:12.3829983Z Written: gen_embedding_backward_lamb_split_weighted_cuda.cu 2025-05-07T19:51:12.3830447Z Written: gen_embedding_backward_lamb_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:12.3830931Z Written: gen_embedding_backward_lamb_split_unweighted_cuda.cu 2025-05-07T19:51:12.3831386Z Written: gen_embedding_backward_lamb_split_weighted_meta.cpp 2025-05-07T19:51:12.3831836Z Written: gen_embedding_backward_lamb_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:12.3832326Z Written: gen_embedding_backward_lamb_split_unweighted_meta.cpp 2025-05-07T19:51:12.3832782Z Written: gen_embedding_backward_lamb_split_weighted_kernel_cta.cu 2025-05-07T19:51:12.3833292Z Written: gen_embedding_backward_lamb_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:12.3833800Z Written: gen_embedding_backward_lamb_split_unweighted_kernel_cta.cu 2025-05-07T19:51:12.3834290Z Written: gen_embedding_backward_lamb_split_weighted_kernel_warp.cu 2025-05-07T19:51:12.3834804Z Written: gen_embedding_backward_lamb_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:12.3835312Z Written: gen_embedding_backward_lamb_split_unweighted_kernel_warp.cu 2025-05-07T19:51:12.3835878Z Written: gen_embedding_optimizer_lamb_split_device_kernel.cuh 2025-05-07T19:51:12.3836423Z Written: gen_embedding_backward_split_lamb.cpp 2025-05-07T19:51:12.3836791Z Written: gen_embedding_split_lamb_pt2_autograd.cpp 2025-05-07T19:51:12.3837198Z Written: gen_embedding_backward_split_lamb_pt2_cuda_wrapper.cpp 2025-05-07T19:51:12.3837589Z Written: lookup_lamb.py 2025-05-07T19:51:12.3837865Z Written: gen_embedding_backward_split_lamb_cpu.cpp 2025-05-07T19:51:12.3838284Z Written: gen_embedding_backward_split_lamb_pt2_cpu_wrapper.cpp 2025-05-07T19:51:12.3838734Z Written: gen_embedding_backward_lars_sgd_split_weighted_cuda.cu 2025-05-07T19:51:12.3839212Z Written: gen_embedding_backward_lars_sgd_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:12.3839719Z Written: gen_embedding_backward_lars_sgd_split_unweighted_cuda.cu 2025-05-07T19:51:12.3840178Z Written: gen_embedding_backward_lars_sgd_split_weighted_meta.cpp 2025-05-07T19:51:12.3840674Z Written: gen_embedding_backward_lars_sgd_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:12.3841175Z Written: gen_embedding_backward_lars_sgd_split_unweighted_meta.cpp 2025-05-07T19:51:12.3841674Z Written: gen_embedding_backward_lars_sgd_split_weighted_kernel_cta.cu 2025-05-07T19:51:12.3842212Z Written: gen_embedding_backward_lars_sgd_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:12.3842770Z Written: gen_embedding_backward_lars_sgd_split_unweighted_kernel_cta.cu 2025-05-07T19:51:12.3843322Z Written: gen_embedding_backward_lars_sgd_split_weighted_kernel_warp.cu 2025-05-07T19:51:12.3843871Z Written: gen_embedding_backward_lars_sgd_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:12.3844451Z Written: gen_embedding_backward_lars_sgd_split_unweighted_kernel_warp.cu 2025-05-07T19:51:12.3844945Z Written: gen_embedding_optimizer_lars_sgd_split_device_kernel.cuh 2025-05-07T19:51:12.3845366Z Written: gen_embedding_backward_split_lars_sgd.cpp 2025-05-07T19:51:12.3845786Z Written: gen_embedding_split_lars_sgd_pt2_autograd.cpp 2025-05-07T19:51:12.3846244Z Written: gen_embedding_backward_split_lars_sgd_pt2_cuda_wrapper.cpp 2025-05-07T19:51:12.3846665Z Written: lookup_lars_sgd.py 2025-05-07T19:51:12.3846989Z Written: gen_embedding_backward_split_lars_sgd_cpu.cpp 2025-05-07T19:51:12.3847463Z Written: gen_embedding_backward_split_lars_sgd_pt2_cpu_wrapper.cpp 2025-05-07T19:51:12.3848005Z Written: gen_embedding_backward_partial_rowwise_adam_split_weighted_cuda.cu 2025-05-07T19:51:12.3848584Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:12.3849154Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_cuda.cu 2025-05-07T19:51:12.3849759Z Written: gen_embedding_backward_partial_rowwise_adam_split_weighted_meta.cpp 2025-05-07T19:51:12.3850351Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:12.3850960Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_meta.cpp 2025-05-07T19:51:12.3851527Z Written: gen_embedding_backward_partial_rowwise_adam_split_weighted_kernel_cta.cu 2025-05-07T19:51:12.3852194Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:12.3852840Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_kernel_cta.cu 2025-05-07T19:51:12.3853479Z Written: gen_embedding_backward_partial_rowwise_adam_split_weighted_kernel_warp.cu 2025-05-07T19:51:12.3854116Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:12.3854772Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_kernel_warp.cu 2025-05-07T19:51:12.4899765Z Written: gen_embedding_optimizer_partial_rowwise_adam_split_device_kernel.cuh 2025-05-07T19:51:12.4901415Z Written: gen_embedding_backward_split_partial_rowwise_adam.cpp 2025-05-07T19:51:12.4902190Z Written: gen_embedding_split_partial_rowwise_adam_pt2_autograd.cpp 2025-05-07T19:51:12.4902919Z Written: gen_embedding_backward_split_partial_rowwise_adam_pt2_cuda_wrapper.cpp 2025-05-07T19:51:12.4903649Z Written: lookup_partial_rowwise_adam.py 2025-05-07T19:51:12.4904059Z Written: gen_embedding_backward_split_partial_rowwise_adam_cpu.cpp 2025-05-07T19:51:12.4904587Z Written: gen_embedding_backward_split_partial_rowwise_adam_pt2_cpu_wrapper.cpp 2025-05-07T19:51:12.4905169Z Written: gen_embedding_backward_partial_rowwise_lamb_split_weighted_cuda.cu 2025-05-07T19:51:12.4905768Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:12.4906359Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_cuda.cu 2025-05-07T19:51:12.4906945Z Written: gen_embedding_backward_partial_rowwise_lamb_split_weighted_meta.cpp 2025-05-07T19:51:12.4907533Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:12.4908152Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_meta.cpp 2025-05-07T19:51:12.4908743Z Written: gen_embedding_backward_partial_rowwise_lamb_split_weighted_kernel_cta.cu 2025-05-07T19:51:12.4909406Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:12.4910058Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_kernel_cta.cu 2025-05-07T19:51:12.4910661Z Written: gen_embedding_backward_partial_rowwise_lamb_split_weighted_kernel_warp.cu 2025-05-07T19:51:12.4911310Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:12.4911948Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_kernel_warp.cu 2025-05-07T19:51:12.4912562Z Written: gen_embedding_optimizer_partial_rowwise_lamb_split_device_kernel.cuh 2025-05-07T19:51:12.4913103Z Written: gen_embedding_backward_split_partial_rowwise_lamb.cpp 2025-05-07T19:51:12.4913571Z Written: gen_embedding_split_partial_rowwise_lamb_pt2_autograd.cpp 2025-05-07T19:51:12.4914117Z Written: gen_embedding_backward_split_partial_rowwise_lamb_pt2_cuda_wrapper.cpp 2025-05-07T19:51:12.4914581Z Written: lookup_partial_rowwise_lamb.py 2025-05-07T19:51:12.4915000Z Written: gen_embedding_backward_split_partial_rowwise_lamb_cpu.cpp 2025-05-07T19:51:12.4915527Z Written: gen_embedding_backward_split_partial_rowwise_lamb_pt2_cpu_wrapper.cpp 2025-05-07T19:51:12.4916106Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_vbe_cuda.cu 2025-05-07T19:51:12.4916661Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_cuda.cu 2025-05-07T19:51:12.4917181Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_cuda.cu 2025-05-07T19:51:12.4917706Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_cuda.cu 2025-05-07T19:51:12.4918441Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_nobag_cuda.cu 2025-05-07T19:51:12.4919035Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:12.4919592Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_vbe_cuda.cu 2025-05-07T19:51:12.4920166Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_cuda.cu 2025-05-07T19:51:12.4920732Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_cuda.cu 2025-05-07T19:51:12.4921259Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_cuda.cu 2025-05-07T19:51:12.4921823Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_vbe_meta.cpp 2025-05-07T19:51:12.4922368Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_meta.cpp 2025-05-07T19:51:12.4922928Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_meta.cpp 2025-05-07T19:51:12.4923436Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_meta.cpp 2025-05-07T19:51:12.4924002Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_nobag_meta.cpp 2025-05-07T19:51:12.4924605Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:12.4925165Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_vbe_meta.cpp 2025-05-07T19:51:12.4925848Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_meta.cpp 2025-05-07T19:51:12.4926395Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_meta.cpp 2025-05-07T19:51:12.4926951Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_meta.cpp 2025-05-07T19:51:12.4927505Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_vbe_kernel_cta.cu 2025-05-07T19:51:12.4928107Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_kernel_cta.cu 2025-05-07T19:51:12.4928689Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_kernel_cta.cu 2025-05-07T19:51:12.4929235Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_kernel_cta.cu 2025-05-07T19:51:12.4929814Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:12.4930417Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:12.4931041Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_vbe_kernel_cta.cu 2025-05-07T19:51:12.4931653Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_kernel_cta.cu 2025-05-07T19:51:12.4932230Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_kernel_cta.cu 2025-05-07T19:51:12.4932811Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_kernel_cta.cu 2025-05-07T19:51:12.4933383Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_vbe_kernel_warp.cu 2025-05-07T19:51:12.4933995Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_kernel_warp.cu 2025-05-07T19:51:12.4934565Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_kernel_warp.cu 2025-05-07T19:51:12.4935145Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_kernel_warp.cu 2025-05-07T19:51:12.4935747Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:12.4936359Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:12.4937109Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_vbe_kernel_warp.cu 2025-05-07T19:51:12.4937909Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_kernel_warp.cu 2025-05-07T19:51:12.4938774Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_kernel_warp.cu 2025-05-07T19:51:12.4939418Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_kernel_warp.cu 2025-05-07T19:51:12.4940060Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_gwd_kernel_cta.cu 2025-05-07T19:51:12.4940726Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_gwd_kernel_cta.cu 2025-05-07T19:51:12.4941457Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_gwd_kernel_cta.cu 2025-05-07T19:51:12.4942148Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_gwd_kernel_cta.cu 2025-05-07T19:51:12.4942805Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_gwd_kernel_warp.cu 2025-05-07T19:51:12.4943490Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_gwd_kernel_warp.cu 2025-05-07T19:51:12.4944181Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_gwd_kernel_warp.cu 2025-05-07T19:51:12.4944857Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_gwd_kernel_warp.cu 2025-05-07T19:51:12.4945519Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_gwd_cuda.cu 2025-05-07T19:51:12.4946114Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_gwd_cuda.cu 2025-05-07T19:51:12.4946748Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_gwd_cuda.cu 2025-05-07T19:51:12.4947387Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_gwd_cuda.cu 2025-05-07T19:51:12.4947965Z Written: gen_embedding_optimizer_rowwise_adagrad_ssd_device_kernel.cuh 2025-05-07T19:51:12.4948544Z Written: gen_embedding_optimizer_rowwise_adagrad_split_device_kernel.cuh 2025-05-07T19:51:12.4949053Z Written: gen_embedding_backward_ssd_rowwise_adagrad.cpp 2025-05-07T19:51:12.4949587Z Written: gen_embedding_ssd_rowwise_adagrad_pt2_autograd.cpp 2025-05-07T19:51:12.4950202Z Written: gen_embedding_backward_ssd_rowwise_adagrad_pt2_cuda_wrapper.cpp 2025-05-07T19:51:12.4950659Z Written: lookup_rowwise_adagrad_ssd.py 2025-05-07T19:51:12.4951023Z Written: gen_embedding_backward_split_rowwise_adagrad.cpp 2025-05-07T19:51:12.4951486Z Written: gen_embedding_split_rowwise_adagrad_pt2_autograd.cpp 2025-05-07T19:51:12.4952003Z Written: gen_embedding_backward_split_rowwise_adagrad_pt2_cuda_wrapper.cpp 2025-05-07T19:51:12.4952428Z Written: lookup_rowwise_adagrad.py 2025-05-07T19:51:12.4952805Z Written: gen_embedding_backward_rowwise_adagrad_split_cpu.cpp 2025-05-07T19:51:12.4953258Z Written: gen_embedding_backward_split_rowwise_adagrad_cpu.cpp 2025-05-07T19:51:12.4953784Z Written: gen_embedding_backward_split_rowwise_adagrad_pt2_cpu_wrapper.cpp 2025-05-07T19:51:12.4954351Z Written: gen_embedding_optimizer_approx_rowwise_adagrad_split_device_kernel.cuh 2025-05-07T19:51:12.4954911Z Written: gen_embedding_backward_split_approx_rowwise_adagrad.cpp 2025-05-07T19:51:12.4955428Z Written: gen_embedding_split_approx_rowwise_adagrad_pt2_autograd.cpp 2025-05-07T19:51:12.4955978Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_pt2_cuda_wrapper.cpp 2025-05-07T19:51:12.4956553Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_cpu.cpp 2025-05-07T19:51:12.4957097Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_pt2_cpu_wrapper.cpp 2025-05-07T19:51:12.4957761Z Written: gen_embedding_optimizer_rowwise_adagrad_with_weight_decay_split_device_kernel.cuh 2025-05-07T19:51:12.4958394Z Written: gen_embedding_backward_split_rowwise_adagrad_with_weight_decay.cpp 2025-05-07T19:51:12.4958952Z Written: gen_embedding_split_rowwise_adagrad_with_weight_decay_pt2_autograd.cpp 2025-05-07T19:51:12.4959593Z Written: gen_embedding_backward_split_rowwise_adagrad_with_weight_decay_pt2_cuda_wrapper.cpp 2025-05-07T19:51:12.4960218Z Written: gen_embedding_backward_split_rowwise_adagrad_with_weight_decay_cpu.cpp 2025-05-07T19:51:12.4960861Z Written: gen_embedding_backward_split_rowwise_adagrad_with_weight_decay_pt2_cpu_wrapper.cpp 2025-05-07T19:51:12.4961551Z Written: gen_embedding_optimizer_approx_rowwise_adagrad_with_weight_decay_split_device_kernel.cuh 2025-05-07T19:51:12.4962229Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_weight_decay.cpp 2025-05-07T19:51:12.4962861Z Written: gen_embedding_split_approx_rowwise_adagrad_with_weight_decay_pt2_autograd.cpp 2025-05-07T19:51:12.4963541Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_weight_decay_pt2_cuda_wrapper.cpp 2025-05-07T19:51:12.6261785Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_weight_decay_cpu.cpp 2025-05-07T19:51:12.6262817Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_weight_decay_pt2_cpu_wrapper.cpp 2025-05-07T19:51:12.6263672Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_vbe_cuda.cu 2025-05-07T19:51:12.6264366Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_cuda.cu 2025-05-07T19:51:12.6265057Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:12.6265855Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_vbe_cuda.cu 2025-05-07T19:51:12.6266520Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_cuda.cu 2025-05-07T19:51:12.6267152Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_vbe_meta.cpp 2025-05-07T19:51:12.6268411Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_meta.cpp 2025-05-07T19:51:12.6269117Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:12.6269868Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_vbe_meta.cpp 2025-05-07T19:51:12.6270592Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_meta.cpp 2025-05-07T19:51:12.6271435Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_vbe_kernel_cta.cu 2025-05-07T19:51:12.6272190Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_kernel_cta.cu 2025-05-07T19:51:12.6272938Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:12.6273720Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_vbe_kernel_cta.cu 2025-05-07T19:51:12.6274684Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_kernel_cta.cu 2025-05-07T19:51:12.6275375Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_vbe_kernel_warp.cu 2025-05-07T19:51:12.6276084Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_kernel_warp.cu 2025-05-07T19:51:12.6276774Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:12.6277513Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_vbe_kernel_warp.cu 2025-05-07T19:51:12.6278222Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_kernel_warp.cu 2025-05-07T19:51:12.6278875Z Written: gen_embedding_optimizer_rowwise_adagrad_with_counter_split_device_kernel.cuh 2025-05-07T19:51:12.6279467Z Written: gen_embedding_backward_split_rowwise_adagrad_with_counter.cpp 2025-05-07T19:51:12.6279993Z Written: gen_embedding_split_rowwise_adagrad_with_counter_pt2_autograd.cpp 2025-05-07T19:51:12.6280605Z Written: gen_embedding_backward_split_rowwise_adagrad_with_counter_pt2_cuda_wrapper.cpp 2025-05-07T19:51:12.6281120Z Written: lookup_rowwise_adagrad_with_counter.py 2025-05-07T19:51:12.6281590Z Written: gen_embedding_backward_split_rowwise_adagrad_with_counter_cpu.cpp 2025-05-07T19:51:12.6282196Z Written: gen_embedding_backward_split_rowwise_adagrad_with_counter_pt2_cpu_wrapper.cpp 2025-05-07T19:51:12.6282846Z Written: gen_embedding_optimizer_approx_rowwise_adagrad_with_counter_split_device_kernel.cuh 2025-05-07T19:51:12.6283496Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_counter.cpp 2025-05-07T19:51:12.6284069Z Written: gen_embedding_split_approx_rowwise_adagrad_with_counter_pt2_autograd.cpp 2025-05-07T19:51:12.6284734Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_counter_pt2_cuda_wrapper.cpp 2025-05-07T19:51:12.6285405Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_counter_cpu.cpp 2025-05-07T19:51:12.6286049Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_counter_pt2_cpu_wrapper.cpp 2025-05-07T19:51:12.6286818Z Written: gen_embedding_optimizer_rowwise_weighted_adagrad_split_device_kernel.cuh 2025-05-07T19:51:12.6287373Z Written: gen_embedding_backward_split_rowwise_weighted_adagrad.cpp 2025-05-07T19:51:12.6287906Z Written: gen_embedding_split_rowwise_weighted_adagrad_pt2_autograd.cpp 2025-05-07T19:51:12.6288465Z Written: gen_embedding_backward_split_rowwise_weighted_adagrad_pt2_cuda_wrapper.cpp 2025-05-07T19:51:12.6289057Z Written: gen_embedding_backward_split_rowwise_weighted_adagrad_cpu.cpp 2025-05-07T19:51:12.6289636Z Written: gen_embedding_backward_split_rowwise_weighted_adagrad_pt2_cpu_wrapper.cpp 2025-05-07T19:51:12.6290169Z Written: gen_embedding_backward_sgd_split_weighted_vbe_cuda.cu 2025-05-07T19:51:12.6290634Z Written: gen_embedding_backward_sgd_split_weighted_cuda.cu 2025-05-07T19:51:12.6291083Z Written: gen_embedding_backward_sgd_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:12.6291581Z Written: gen_embedding_backward_sgd_split_unweighted_vbe_cuda.cu 2025-05-07T19:51:12.6292036Z Written: gen_embedding_backward_sgd_split_unweighted_cuda.cu 2025-05-07T19:51:12.6292512Z Written: gen_embedding_backward_sgd_split_weighted_vbe_meta.cpp 2025-05-07T19:51:12.6292981Z Written: gen_embedding_backward_sgd_split_weighted_meta.cpp 2025-05-07T19:51:12.6293443Z Written: gen_embedding_backward_sgd_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:12.6293950Z Written: gen_embedding_backward_sgd_split_unweighted_vbe_meta.cpp 2025-05-07T19:51:12.6294484Z Written: gen_embedding_backward_sgd_split_unweighted_meta.cpp 2025-05-07T19:51:12.6294984Z Written: gen_embedding_backward_sgd_split_weighted_vbe_kernel_cta.cu 2025-05-07T19:51:12.6295468Z Written: gen_embedding_backward_sgd_split_weighted_kernel_cta.cu 2025-05-07T19:51:12.6295994Z Written: gen_embedding_backward_sgd_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:12.6296554Z Written: gen_embedding_backward_sgd_split_unweighted_vbe_kernel_cta.cu 2025-05-07T19:51:12.6297152Z Written: gen_embedding_backward_sgd_split_unweighted_kernel_cta.cu 2025-05-07T19:51:12.6297891Z Written: gen_embedding_backward_sgd_split_weighted_vbe_kernel_warp.cu 2025-05-07T19:51:12.6298420Z Written: gen_embedding_backward_sgd_split_weighted_kernel_warp.cu 2025-05-07T19:51:12.6298991Z Written: gen_embedding_backward_sgd_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:12.6299571Z Written: gen_embedding_backward_sgd_split_unweighted_vbe_kernel_warp.cu 2025-05-07T19:51:12.6300154Z Written: gen_embedding_backward_sgd_split_unweighted_kernel_warp.cu 2025-05-07T19:51:12.6300680Z Written: gen_embedding_optimizer_sgd_split_device_kernel.cuh 2025-05-07T19:51:12.6301114Z Written: gen_embedding_backward_split_sgd.cpp 2025-05-07T19:51:12.6301528Z Written: gen_embedding_split_sgd_pt2_autograd.cpp 2025-05-07T19:51:12.6301976Z Written: gen_embedding_backward_split_sgd_pt2_cuda_wrapper.cpp 2025-05-07T19:51:12.6302405Z Written: lookup_sgd.py 2025-05-07T19:51:12.6302712Z Written: gen_embedding_backward_sgd_split_cpu.cpp 2025-05-07T19:51:12.6303140Z Written: gen_embedding_backward_split_sgd_cpu.cpp 2025-05-07T19:51:12.6303698Z Written: gen_embedding_backward_split_sgd_pt2_cpu_wrapper.cpp 2025-05-07T19:51:12.6304197Z Written: gen_embedding_optimizer_approx_sgd_split_device_kernel.cuh 2025-05-07T19:51:12.6304659Z Written: gen_embedding_backward_split_approx_sgd.cpp 2025-05-07T19:51:12.6305046Z Written: gen_embedding_split_approx_sgd_pt2_autograd.cpp 2025-05-07T19:51:12.6305509Z Written: gen_embedding_backward_split_approx_sgd_pt2_cuda_wrapper.cpp 2025-05-07T19:51:12.6305962Z Written: gen_embedding_backward_split_approx_sgd_cpu.cpp 2025-05-07T19:51:12.6306417Z Written: gen_embedding_backward_split_approx_sgd_pt2_cpu_wrapper.cpp 2025-05-07T19:51:12.6306891Z Written: gen_embedding_backward_none_split_weighted_cuda.cu 2025-05-07T19:51:12.6307341Z Written: gen_embedding_backward_none_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:12.6307814Z Written: gen_embedding_backward_none_split_unweighted_cuda.cu 2025-05-07T19:51:12.6308249Z Written: gen_embedding_backward_none_split_weighted_meta.cpp 2025-05-07T19:51:12.6308822Z Written: gen_embedding_backward_none_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:12.6309290Z Written: gen_embedding_backward_none_split_unweighted_meta.cpp 2025-05-07T19:51:12.6309762Z Written: gen_embedding_backward_none_split_weighted_kernel_cta.cu 2025-05-07T19:51:12.6310254Z Written: gen_embedding_backward_none_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:12.6310781Z Written: gen_embedding_backward_none_split_unweighted_kernel_cta.cu 2025-05-07T19:51:12.6311271Z Written: gen_embedding_backward_none_split_weighted_kernel_warp.cu 2025-05-07T19:51:12.6311785Z Written: gen_embedding_backward_none_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:12.6312317Z Written: gen_embedding_backward_none_split_unweighted_kernel_warp.cu 2025-05-07T19:51:12.6312800Z Written: gen_embedding_optimizer_none_split_device_kernel.cuh 2025-05-07T19:51:12.6313220Z Written: gen_embedding_backward_split_none.cpp 2025-05-07T19:51:12.6313587Z Written: gen_embedding_split_none_pt2_autograd.cpp 2025-05-07T19:51:12.6314033Z Written: gen_embedding_backward_split_none_pt2_cuda_wrapper.cpp 2025-05-07T19:51:12.6314444Z Written: lookup_none.py 2025-05-07T19:51:12.6314741Z Written: gen_embedding_backward_split_none_cpu.cpp 2025-05-07T19:51:12.6315191Z Written: gen_embedding_backward_split_none_pt2_cpu_wrapper.cpp 2025-05-07T19:51:12.6315677Z Written: gen_embedding_backward_split_weighted_device_kernel_hip.hip 2025-05-07T19:51:12.6316299Z Written: gen_embedding_backward_split_unweighted_nobag_device_kernel_hip.hip 2025-05-07T19:51:12.6316816Z Written: gen_embedding_backward_split_unweighted_device_kernel_hip.hip 2025-05-07T19:51:12.6317352Z Written: gen_embedding_backward_ssd_weighted_vbe_device_kernel.cuh 2025-05-07T19:51:12.6317876Z Written: gen_embedding_backward_split_weighted_vbe_device_kernel.cuh 2025-05-07T19:51:12.6318352Z Written: gen_embedding_backward_ssd_weighted_device_kernel.cuh 2025-05-07T19:51:12.6318822Z Written: gen_embedding_backward_split_weighted_device_kernel.cuh 2025-05-07T19:51:12.6319302Z Written: gen_embedding_backward_ssd_unweighted_nobag_device_kernel.cuh 2025-05-07T19:51:12.6319835Z Written: gen_embedding_backward_split_unweighted_nobag_device_kernel.cuh 2025-05-07T19:51:12.6320338Z Written: gen_embedding_backward_ssd_unweighted_vbe_device_kernel.cuh 2025-05-07T19:51:12.6320847Z Written: gen_embedding_backward_split_unweighted_vbe_device_kernel.cuh 2025-05-07T19:51:12.6321344Z Written: gen_embedding_backward_ssd_unweighted_device_kernel.cuh 2025-05-07T19:51:12.6321804Z Written: gen_embedding_backward_split_unweighted_device_kernel.cuh 2025-05-07T19:51:12.6322282Z Written: gen_embedding_backward_split_common_device_kernel.cuh 2025-05-07T19:51:12.6322713Z Written: gen_embedding_backward_split_grad_embedding_ops.cu 2025-05-07T19:51:12.6323185Z Written: gen_embedding_backward_dense_indice_weights_codegen_cuda.cu 2025-05-07T19:51:12.6323662Z Written: gen_embedding_backward_ssd_indice_weights_codegen_cuda.cu 2025-05-07T19:51:12.6324156Z Written: gen_embedding_backward_split_indice_weights_codegen_cuda.cu 2025-05-07T19:51:12.6324591Z Written: pt2_arg_utils.h 2025-05-07T19:51:12.6324839Z Written: __init__.py 2025-05-07T19:51:12.6325111Z Written: lookup_args_ssd.py 2025-05-07T19:51:12.6325374Z Written: lookup_args.py 2025-05-07T19:51:12.6361445Z 2025-05-07T19:51:12.7436570Z 2025-05-07T19:51:12.7437096Z ================================================================================ 2025-05-07T19:51:12.7438261Z Running code generation script ... 2025-05-07T19:51:12.7440587Z /github/home/miniconda/envs/build_binary/bin/python /__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_embedding_optimizer.py --opensource 2025-05-07T19:51:12.7443002Z ================================================================================ 2025-05-07T19:51:12.7443440Z 2025-05-07T19:51:12.7443726Z [ARGS PARSE] Parsed arguments: Namespace(install_dir='.', is_fbcode=False, is_rocm=False) 2025-05-07T19:51:12.7444545Z [GENERATE OPTIMIZERS]: ['/__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_embedding_optimizer.py', '--opensource'] 2025-05-07T19:51:12.7445521Z Written: gen_embedding_optimizer_rowwise_adagrad_split_cuda.cu 2025-05-07T19:51:12.7446011Z Written: gen_embedding_optimizer_rowwise_adagrad_split_kernel.cu 2025-05-07T19:51:12.7446494Z Written: gen_embedding_optimizer_rowwise_adagrad_split.cpp 2025-05-07T19:51:12.7446980Z Written: gen_embedding_optimizer_rowwise_adagrad_split_device_kernel.cuh 2025-05-07T19:51:12.7447500Z Written: split_embedding_optimizer_rowwise_adagrad.py 2025-05-07T19:51:12.7447851Z Written: optimizer_args.py 2025-05-07T19:51:12.7506505Z 2025-05-07T19:51:12.7507083Z 2025-05-07T19:51:12.7507685Z ================================================================================ 2025-05-07T19:51:12.7508833Z Running code generation script ... 2025-05-07T19:51:12.7509633Z /github/home/miniconda/envs/build_binary/bin/python /__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_forward_quantized.py --opensource 2025-05-07T19:51:12.7510560Z ================================================================================ 2025-05-07T19:51:12.7510811Z 2025-05-07T19:51:12.8776627Z [ARGS PARSE] Parsed arguments: Namespace(install_dir='.', is_fbcode=False, is_rocm=False) 2025-05-07T19:51:12.8779411Z [GENERATE FORWARD QUANTIZED]: ['/__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_forward_quantized.py', '--opensource'] 2025-05-07T19:51:12.8780301Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_fp32_codegen_cuda.cu 2025-05-07T19:51:12.8781289Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_fp16_codegen_cuda.cu 2025-05-07T19:51:12.8781993Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_fp8_codegen_cuda.cu 2025-05-07T19:51:12.8782678Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_int8_codegen_cuda.cu 2025-05-07T19:51:12.8783376Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_int4_codegen_cuda.cu 2025-05-07T19:51:12.8784158Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_int2_codegen_cuda.cu 2025-05-07T19:51:12.8784821Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_fp32_codegen_cuda.cu 2025-05-07T19:51:12.8785528Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_fp16_codegen_cuda.cu 2025-05-07T19:51:12.8786215Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_fp8_codegen_cuda.cu 2025-05-07T19:51:12.8786916Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_int8_codegen_cuda.cu 2025-05-07T19:51:12.8787597Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_int4_codegen_cuda.cu 2025-05-07T19:51:12.8788298Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_int2_codegen_cuda.cu 2025-05-07T19:51:12.8788978Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_fp32_codegen_cuda.cu 2025-05-07T19:51:12.8789625Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_fp16_codegen_cuda.cu 2025-05-07T19:51:12.8790284Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_fp8_codegen_cuda.cu 2025-05-07T19:51:12.8790924Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_int8_codegen_cuda.cu 2025-05-07T19:51:12.8791582Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_int4_codegen_cuda.cu 2025-05-07T19:51:12.8792243Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_int2_codegen_cuda.cu 2025-05-07T19:51:12.8792861Z Written: gen_embedding_forward_quantized_split_nbit_host_weighted_codegen_cuda.cu 2025-05-07T19:51:12.8793483Z Written: gen_embedding_forward_quantized_split_nbit_host_unweighted_nobag_codegen_cuda.cu 2025-05-07T19:51:12.8794101Z Written: gen_embedding_forward_quantized_split_nbit_host_unweighted_codegen_cuda.cu 2025-05-07T19:51:12.8794656Z Written: gen_embedding_forward_quantized_weighted_codegen_cpu.cpp 2025-05-07T19:51:12.8795159Z Written: gen_embedding_forward_quantized_unweighted_codegen_cpu.cpp 2025-05-07T19:51:12.8845715Z 2025-05-07T19:51:12.8845812Z 2025-05-07T19:51:12.8846738Z ================================================================================ 2025-05-07T19:51:12.8847850Z Running code generation script ... 2025-05-07T19:51:12.8850085Z /github/home/miniconda/envs/build_binary/bin/python /__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_forward_split.py --opensource 2025-05-07T19:51:12.8852147Z ================================================================================ 2025-05-07T19:51:12.8852380Z 2025-05-07T19:51:13.2941322Z [ARGS PARSE] Parsed arguments: Namespace(install_dir='.', is_fbcode=False, is_rocm=False) 2025-05-07T19:51:13.2943837Z [GENERATE FORWARD SPLIT]: ['/__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_forward_split.py', '--opensource'] 2025-05-07T19:51:13.2945989Z Written: gen_embedding_forward_dense_weighted_vbe_codegen_cuda.cu 2025-05-07T19:51:13.2947404Z Written: gen_embedding_forward_dense_weighted_codegen_cuda.cu 2025-05-07T19:51:13.2948817Z Written: gen_embedding_forward_dense_unweighted_vbe_codegen_cuda.cu 2025-05-07T19:51:13.2950296Z Written: gen_embedding_forward_dense_unweighted_codegen_cuda.cu 2025-05-07T19:51:13.2951134Z Written: gen_embedding_forward_ssd_weighted_vbe_codegen_cuda.cu 2025-05-07T19:51:13.2951589Z Written: gen_embedding_forward_split_weighted_vbe_codegen_cuda.cu 2025-05-07T19:51:13.2952019Z Written: gen_embedding_forward_ssd_weighted_codegen_cuda.cu 2025-05-07T19:51:13.2952688Z Written: gen_embedding_forward_split_weighted_codegen_cuda.cu 2025-05-07T19:51:13.2953138Z Written: gen_embedding_forward_ssd_unweighted_vbe_codegen_cuda.cu 2025-05-07T19:51:13.2953594Z Written: gen_embedding_forward_split_unweighted_vbe_codegen_cuda.cu 2025-05-07T19:51:13.2954056Z Written: gen_embedding_forward_ssd_unweighted_codegen_cuda.cu 2025-05-07T19:51:13.2954481Z Written: gen_embedding_forward_split_unweighted_codegen_cuda.cu 2025-05-07T19:51:13.2954963Z Written: gen_embedding_forward_split_weighted_vbe_gwd_codegen_cuda.cu 2025-05-07T19:51:13.2955431Z Written: gen_embedding_forward_split_weighted_gwd_codegen_cuda.cu 2025-05-07T19:51:13.2955914Z Written: gen_embedding_forward_split_unweighted_vbe_gwd_codegen_cuda.cu 2025-05-07T19:51:13.2956411Z Written: gen_embedding_forward_split_unweighted_gwd_codegen_cuda.cu 2025-05-07T19:51:13.2956873Z Written: gen_embedding_forward_dense_weighted_vbe_codegen_meta.cpp 2025-05-07T19:51:13.2957333Z Written: gen_embedding_forward_dense_weighted_codegen_meta.cpp 2025-05-07T19:51:13.2957792Z Written: gen_embedding_forward_dense_unweighted_vbe_codegen_meta.cpp 2025-05-07T19:51:13.2958261Z Written: gen_embedding_forward_dense_unweighted_codegen_meta.cpp 2025-05-07T19:51:13.2958700Z Written: gen_embedding_forward_ssd_weighted_vbe_codegen_meta.cpp 2025-05-07T19:51:13.2959157Z Written: gen_embedding_forward_split_weighted_vbe_codegen_meta.cpp 2025-05-07T19:51:13.2959612Z Written: gen_embedding_forward_ssd_weighted_codegen_meta.cpp 2025-05-07T19:51:13.2960034Z Written: gen_embedding_forward_split_weighted_codegen_meta.cpp 2025-05-07T19:51:13.2960500Z Written: gen_embedding_forward_ssd_unweighted_vbe_codegen_meta.cpp 2025-05-07T19:51:13.2960972Z Written: gen_embedding_forward_split_unweighted_vbe_codegen_meta.cpp 2025-05-07T19:51:13.2961445Z Written: gen_embedding_forward_ssd_unweighted_codegen_meta.cpp 2025-05-07T19:51:13.2961884Z Written: gen_embedding_forward_split_unweighted_codegen_meta.cpp 2025-05-07T19:51:13.2962338Z Written: gen_embedding_forward_dense_weighted_vbe_kernel.cu 2025-05-07T19:51:13.2962751Z Written: gen_embedding_forward_dense_weighted_kernel.cu 2025-05-07T19:51:13.2963164Z Written: gen_embedding_forward_dense_unweighted_nobag_kernel.cu 2025-05-07T19:51:13.2963623Z Written: gen_embedding_forward_dense_unweighted_vbe_kernel.cu 2025-05-07T19:51:13.2964030Z Written: gen_embedding_forward_dense_unweighted_kernel.cu 2025-05-07T19:51:13.2964444Z Written: gen_embedding_forward_ssd_weighted_vbe_kernel.cu 2025-05-07T19:51:13.2964848Z Written: gen_embedding_forward_split_weighted_vbe_kernel.cu 2025-05-07T19:51:13.2965259Z Written: gen_embedding_forward_ssd_weighted_kernel.cu 2025-05-07T19:51:13.2965736Z Written: gen_embedding_forward_split_weighted_kernel.cu 2025-05-07T19:51:13.2966149Z Written: gen_embedding_forward_ssd_unweighted_nobag_kernel.cu 2025-05-07T19:51:13.2966586Z Written: gen_embedding_forward_split_unweighted_nobag_kernel.cu 2025-05-07T19:51:13.2967010Z Written: gen_embedding_forward_ssd_unweighted_vbe_kernel.cu 2025-05-07T19:51:13.2967627Z Written: gen_embedding_forward_split_unweighted_vbe_kernel.cu 2025-05-07T19:51:13.2968222Z Written: gen_embedding_forward_ssd_unweighted_kernel.cu 2025-05-07T19:51:13.2968653Z Written: gen_embedding_forward_split_unweighted_kernel.cu 2025-05-07T19:51:13.2969114Z Written: gen_embedding_forward_split_weighted_vbe_gwd_kernel.cu 2025-05-07T19:51:13.2969598Z Written: gen_embedding_forward_split_weighted_gwd_kernel.cu 2025-05-07T19:51:13.2970086Z Written: gen_embedding_forward_split_unweighted_vbe_gwd_kernel.cu 2025-05-07T19:51:13.2970561Z Written: gen_embedding_forward_split_unweighted_gwd_kernel.cu 2025-05-07T19:51:13.2971032Z Written: gen_embedding_forward_split_weighted_v2_kernel.cu 2025-05-07T19:51:13.2971471Z Written: gen_embedding_forward_split_unweighted_v2_kernel.cu 2025-05-07T19:51:13.2971967Z Written: gen_embedding_forward_dense_unweighted_nobag_kernel_small.cu 2025-05-07T19:51:13.2972488Z Written: gen_embedding_forward_dense_unweighted_nobag_kernel_small.cu 2025-05-07T19:51:13.2973095Z Written: gen_embedding_forward_ssd_unweighted_nobag_kernel_small.cu 2025-05-07T19:51:13.2973610Z Written: gen_embedding_forward_split_unweighted_nobag_kernel_small.cu 2025-05-07T19:51:13.2974185Z Written: gen_embedding_forward_split_pt2_cuda_wrapper.cpp 2025-05-07T19:51:13.2974596Z Written: gen_embedding_forward_split_pt2_cpu_wrapper.cpp 2025-05-07T19:51:13.2974974Z Written: gen_embedding_forward_ssd_pt2_cuda_wrapper.cpp 2025-05-07T19:51:13.3024783Z 2025-05-07T19:51:13.3024895Z 2025-05-07T19:51:13.3025398Z ================================================================================ 2025-05-07T19:51:13.3026482Z Running code generation script ... 2025-05-07T19:51:13.3028707Z /github/home/miniconda/envs/build_binary/bin/python /__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_index_select.py --opensource 2025-05-07T19:51:13.3030975Z ================================================================================ 2025-05-07T19:51:13.3031580Z 2025-05-07T19:51:13.5943731Z [ARGS PARSE] Parsed arguments: Namespace(install_dir='.', is_fbcode=False, is_rocm=False) 2025-05-07T19:51:13.5944821Z [INDEX SELECT GENERATOR]: ['/__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_index_select.py', '--opensource'] 2025-05-07T19:51:13.5945537Z Written: gen_batch_index_select_dim0_forward_codegen_cuda.cu 2025-05-07T19:51:13.5945961Z Written: gen_batch_index_select_dim0_forward_kernel.cu 2025-05-07T19:51:13.5946396Z Written: gen_batch_index_select_dim0_forward_kernel_small.cu 2025-05-07T19:51:13.5946847Z Written: gen_batch_index_select_dim0_backward_codegen_cuda.cu 2025-05-07T19:51:13.5947302Z Written: gen_batch_index_select_dim0_backward_kernel_cta.cu 2025-05-07T19:51:13.5947769Z Written: gen_batch_index_select_dim0_backward_kernel_warp.cu 2025-05-07T19:51:13.5948265Z Written: gen_embedding_backward_split_batch_index_select_device_kernel.cuh 2025-05-07T19:51:13.5948935Z Written: gen_embedding_backward_split_grad_index_select.cu 2025-05-07T19:51:13.5949364Z Written: gen_embedding_backward_split_common_device_kernel.cuh 2025-05-07T19:51:13.6105997Z 2025-05-07T19:51:13.6106110Z 2025-05-07T19:51:13.6106604Z ================================================================================ 2025-05-07T19:51:13.6107982Z GPU CPP Library Target: fbgemm_gpu_experimental_gen_ai (SHARED) 2025-05-07T19:51:13.6109044Z 2025-05-07T19:51:13.6109540Z CPU_SRCS: 2025-05-07T19:51:13.6110581Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/attention.cpp 2025-05-07T19:51:13.6111423Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cpp 2025-05-07T19:51:13.6112013Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cpp 2025-05-07T19:51:13.6112796Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cpp 2025-05-07T19:51:13.6113428Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cpp 2025-05-07T19:51:13.6114069Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cpp 2025-05-07T19:51:13.6114664Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cpp 2025-05-07T19:51:13.6115080Z 2025-05-07T19:51:13.6115253Z GPU_SRCS: 2025-05-07T19:51:13.6115615Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/gqa_attn_splitk.cu 2025-05-07T19:51:13.6116224Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cu 2025-05-07T19:51:13.6116802Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu 2025-05-07T19:51:13.6117337Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cu 2025-05-07T19:51:13.6117934Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu 2025-05-07T19:51:13.6118584Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cu 2025-05-07T19:51:13.6119178Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cu 2025-05-07T19:51:13.6119593Z 2025-05-07T19:51:13.6119784Z CUDA_SPECIFIC_SRCS: 2025-05-07T19:51:13.6120293Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu 2025-05-07T19:51:13.6121219Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16.cu 2025-05-07T19:51:13.6122067Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu 2025-05-07T19:51:13.6122975Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu 2025-05-07T19:51:13.6123841Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16.cu 2025-05-07T19:51:13.6124742Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T19:51:13.6125709Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T19:51:13.6126696Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T19:51:13.6127664Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T19:51:13.6128650Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T19:51:13.6129636Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T19:51:13.6130594Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T19:51:13.6131583Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T19:51:13.6132569Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T19:51:13.6133537Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T19:51:13.6134534Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T19:51:13.6135498Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T19:51:13.6136485Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T19:51:13.6137605Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T19:51:13.6138648Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T19:51:13.6139785Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T19:51:13.6140756Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T19:51:13.6141765Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T19:51:13.6142758Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T19:51:13.6143722Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T19:51:13.6144716Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T19:51:13.6145683Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T19:51:13.6146669Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T19:51:13.6147662Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T19:51:13.6148601Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16.cu 2025-05-07T19:51:13.6149442Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu 2025-05-07T19:51:13.6150304Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu 2025-05-07T19:51:13.6151119Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_lite.cu 2025-05-07T19:51:13.6151971Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu 2025-05-07T19:51:13.6152972Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu 2025-05-07T19:51:13.6154156Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu 2025-05-07T19:51:13.6155340Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu 2025-05-07T19:51:13.6156509Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu 2025-05-07T19:51:13.6157693Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu 2025-05-07T19:51:13.6158958Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu 2025-05-07T19:51:13.6160112Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu 2025-05-07T19:51:13.6161265Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu 2025-05-07T19:51:13.6162384Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu 2025-05-07T19:51:13.6163711Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu 2025-05-07T19:51:13.6165130Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu 2025-05-07T19:51:13.6166405Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu 2025-05-07T19:51:13.6167916Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu 2025-05-07T19:51:13.6169067Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu 2025-05-07T19:51:13.6170107Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu 2025-05-07T19:51:13.6171022Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu 2025-05-07T19:51:13.6171864Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu 2025-05-07T19:51:13.6172720Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu 2025-05-07T19:51:13.6173598Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu 2025-05-07T19:51:13.6174460Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16.cu 2025-05-07T19:51:13.6175284Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu 2025-05-07T19:51:13.6176111Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/mixed_dtype_utils.cu 2025-05-07T19:51:13.6177113Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16_fast_gemv.cu 2025-05-07T19:51:13.6177873Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu 2025-05-07T19:51:13.6178684Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu 2025-05-07T19:51:13.6179471Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv.cu 2025-05-07T19:51:13.6180231Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv.cuh 2025-05-07T19:51:13.6181007Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/utility.cuh 2025-05-07T19:51:13.6181525Z 2025-05-07T19:51:13.6181772Z HIP_SPECIFIC_SRCS: 2025-05-07T19:51:13.6182176Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gemm/ck_extensions.hip 2025-05-07T19:51:13.6182787Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gemm/gemm.cpp 2025-05-07T19:51:13.6183556Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/bf16_grouped_gemm.hip 2025-05-07T19:51:13.6184791Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x32x128_16x16_1x1_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6186327Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x32x64_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_intrawave_v2.hip 2025-05-07T19:51:13.6187859Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x32x64_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:13.6189366Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x64x128_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_interwave_v1.hip 2025-05-07T19:51:13.6190912Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x64x128_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_interwave_v2.hip 2025-05-07T19:51:13.6192315Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x64x128_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:51:13.6193803Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x96x128_16x16_1x3_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:51:13.6195210Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x96x128_16x16_1x3_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6196633Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x96x128_16x16_1x3_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:13.6198044Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x96x128_16x16_1x3_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:13.6199473Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x96x64_16x16_1x3_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:13.6200872Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x32x16x64_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6202303Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x32x64x128_32x32_1x1_16x8x1_16x8x1_1x16x1x8_8x8x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6203731Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x32x64x128_32x32_1x1_16x8x1_16x8x1_1x16x1x8_8x8x1_1x1_intrawave_v1.hip 2025-05-07T19:51:13.6206332Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x32x96x128_16x16_2x3_16x8x1_16x8x1_1x32x1x4_8x8x1_2x1_intrawave_v2.hip 2025-05-07T19:51:13.6207759Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x64x128x64_32x32_2x2_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6209190Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x64x96x64_16x16_4x3_8x16x1_8x16x1_1x32x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:13.6210635Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x128x128_32x32_2x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6212063Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x128x64_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:51:13.6213489Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x128x64_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6214895Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x224x64_16x16_4x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:13.6216329Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x256x64_32x32_4x2_8x32x1_8x32x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6218056Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x96x64_16x16_4x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:13.6219594Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x16x128x128_16x16_1x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:51:13.6221152Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x16x128x128_16x16_1x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:51:13.6222723Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x16x64x128_16x16_1x1_16x16x1_16x16x1_1x16x1x16_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6224359Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x224x256x32_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:13.6225903Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x256x128x32_32x32_4x2_4x64x1_4x64x1_1x32x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:51:13.6227465Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x256x160x64_16x16_8x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:13.6228982Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x256x192x64_32x32_4x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6230596Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x256x224x64_16x16_8x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:13.6232024Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x256x256x64_32x32_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6233443Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x32x128x128_16x16_1x4_16x16x1_16x16x1_1x32x1x8_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:51:13.6234946Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x32x224x64_16x16_1x7_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:13.6236370Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x32x96x64_16x16_1x3_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:51:13.6237767Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x32x96x64_16x16_1x3_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6239199Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x64x128x128_32x32_2x1_16x16x1_16x16x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6240651Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x64x192x128_16x16_4x3_16x16x1_16x16x1_1x32x1x8_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:13.6242057Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x64x96x64_16x16_2x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:13.6243478Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x16x128_16x16_1x1_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6244897Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x16x128_16x16_1x1_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:13.6246431Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x16x64_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6247820Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x32x128_16x16_1x2_16x4x1_16x4x1_1x16x1x4_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:51:13.6249220Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x48x128_16x16_1x3_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:13.6250611Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x64x128_16x16_1x4_16x4x1_16x4x1_1x16x1x4_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:51:13.6251716Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/ck_utility.hip 2025-05-07T19:51:13.6252452Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_blockwise_gemm.hip 2025-05-07T19:51:13.6253235Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/fp8_rowwise_gemm.hip 2025-05-07T19:51:13.6254377Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x128x16x128_16x16_4x1_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6255762Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x128x32x128_32x32_2x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:13.6257216Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6258935Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2_4_split_k.hip 2025-05-07T19:51:13.6260506Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2_8_split_k.hip 2025-05-07T19:51:13.6262136Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:13.6263639Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:13.6265163Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2_8_split_k.hip 2025-05-07T19:51:13.6266712Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x256_16x16_1x1_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:13.6268373Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6269893Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_interwave_v2_2_split_k.hip 2025-05-07T19:51:13.6271426Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:13.6272976Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_intrawave_v2_2_split_k.hip 2025-05-07T19:51:13.6274493Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x512_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6276005Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x128x128_32x32_1x2_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6277521Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x16x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_interwave_v1.hip 2025-05-07T19:51:13.6279145Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x16x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6280759Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x16x256_16x16_1x1_16x8x1_16x8x1_1x32x1x4_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:51:13.6282158Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x16x512_16x16_1x1_32x4x1_32x4x1_1x32x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6283536Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x16x512_16x16_1x1_32x4x1_32x4x1_1x32x1x4_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:13.6284934Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x64x128_32x32_1x1_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6286346Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x64x128_32x32_1x1_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_intrawave_v2.hip 2025-05-07T19:51:13.6287736Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x64x32x128_32x32_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6289123Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x64x32x128_32x32_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:13.6290594Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x128_16x16_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:13.6291980Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:51:13.6293383Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6294756Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v5.hip 2025-05-07T19:51:13.6296133Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x256_32x32_2x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6297752Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x64_32x32_2x2_4x64x1_4x64x1_1x32x1x8_8x8x1_1x1_intrawave_v4.hip 2025-05-07T19:51:13.6299223Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x160x128_16x16_4x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:13.6300717Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x160x128_32x32_1x5_8x32x1_8x32x1_1x64x1x4_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6302204Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x192x128_32x32_2x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6303698Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x256x128_32x32_2x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6305200Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x64x128_32x32_2x1_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6306693Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x64x256_32x32_2x1_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6308225Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x96x128_16x16_4x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:13.6309721Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x96x256_32x32_1x3_16x16x1_16x16x1_1x64x1x4_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6311227Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x160x128x128_16x16_5x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:13.6312593Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x160x256x128_16x16_5x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:13.6313956Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x160x96x128_16x16_5x3_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6315364Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x16x64x128_16x16_1x1_16x16x1_8x32x1_1x16x1x16_4x4x1_1x1_intrawave_v2_8_split_k.hip 2025-05-07T19:51:13.6316762Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x16x64x512_16x16_1x1_32x8x1_32x8x1_1x16x1x16_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:13.6318181Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x16x64x512_16x16_1x1_32x8x1_32x8x1_1x16x1x16_4x4x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6319551Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x192x128x128_16x16_6x4_8x32x1_8x32x1_1x32x1x8_8x8x1_2x2_intrawave_v3.hip 2025-05-07T19:51:13.6320918Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x192x192x128_16x16_6x6_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:13.6322296Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x192x224x128_16x16_6x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:13.6323681Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x192x256x128_16x16_6x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:13.6325052Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x192x256x128_16x16_6x8_8x32x1_8x32x1_1x32x1x8_8x8x1_2x2_intrawave_v3.hip 2025-05-07T19:51:13.6326430Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x224x160x128_16x16_7x5_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6327813Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x224x192x128_16x16_7x6_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:13.6329188Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x224x256x128_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:13.6330569Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x128x128_16x16_8x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:13.6331943Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x128x128_32x32_4x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6333301Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x160x128_16x16_8x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:13.6334769Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x192x128_16x16_8x6_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:13.6336144Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x192x128_32x32_4x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6337757Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x224x128_16x16_8x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:13.6339254Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x256x128_16x16_8x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:13.6340736Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x256x64_16x16_8x8_4x64x1_4x64x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:13.6342209Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x256x64_32x32_4x4_4x64x1_4x64x1_1x32x1x8_8x8x1_1x1_intrawave_v4.hip 2025-05-07T19:51:13.6343695Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x96x128_16x16_8x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:13.6345230Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x96x128_32x32_2x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:13.6346917Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x32x128x256_32x32_1x1_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6348390Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x32x64x512_16x16_1x2_32x8x1_32x8x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:13.6349965Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x128x128_32x32_1x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6351348Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x128x256_32x32_1x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6352717Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x16x512_16x16_1x1_32x8x1_32x8x1_1x64x1x4_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:13.6354073Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x192x128_32x32_1x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6355442Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x192x256_32x32_1x3_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6356834Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x256x128_32x32_1x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6358184Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x64x128_32x32_1x1_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6359547Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x64x512_32x32_1x1_32x8x1_32x8x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6360921Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x96x256_16x16_2x3_16x16x1_16x16x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:13.6362344Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x80x128x256_16x16_5x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:13.6363713Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x96x128x128_16x16_3x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:13.6365073Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x128_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:51:13.6366423Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x128_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6367894Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x256_16x16_1x1_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:13.6369272Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x256_16x16_1x1_16x4x1_16x4x1_1x4x1x16_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:13.6370605Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x512_16x16_1x1_32x2x1_32x2x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6371973Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x512_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6373431Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x64_16x16_1x1_4x16x1_4x16x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6374577Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/fp8_rowwise_batched_gemm.hip 2025-05-07T19:51:13.6375807Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x256_16x16_1x1_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:51:13.6377349Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x256_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:51:13.6379121Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x256_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6380743Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x256_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:13.6382352Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x256_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:13.6383944Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:13.6385553Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x512_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:13.6387159Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x512_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:13.6388777Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x32x128x128_32x32_1x2_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6390533Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x32x64x128_32x32_1x1_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_intrawave_v2.hip 2025-05-07T19:51:13.6392026Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6393520Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v4.hip 2025-05-07T19:51:13.6395033Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v5.hip 2025-05-07T19:51:13.6396529Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x128x256_32x32_2x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6398017Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x160x128_32x32_1x5_8x32x1_8x32x1_1x64x1x4_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6399507Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x192x128_32x32_2x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6401048Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x256x128_32x32_2x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6402533Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x64x128_32x32_2x1_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6404031Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x96x256_32x32_1x3_16x16x1_16x16x1_1x64x1x4_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6405531Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x16x64x512_16x16_1x1_32x8x1_32x8x1_1x16x1x16_4x4x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6407023Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x224x256x128_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:13.6408519Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x256x128x128_16x16_8x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:13.6410020Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x256x160x128_16x16_8x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:13.6411508Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x256x192x128_16x16_8x6_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:13.6413019Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x256x224x128_16x16_8x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:13.6414535Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x256x256x128_16x16_8x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:13.6416026Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x32x128x256_32x32_1x1_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6417819Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x32x64x512_16x16_1x2_32x8x1_32x8x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:13.6419433Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x64x128x256_32x32_1x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6421060Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x64x192x256_32x32_1x3_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6422680Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x64x64x128_32x32_1x1_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6424288Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x64x64x512_32x32_1x1_32x8x1_32x8x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6425881Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_64x16x16x512_16x16_1x1_32x2x1_32x2x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6427481Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_64x16x16x512_16x16_1x1_32x2x1_32x2x1_1x16x1x4_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:13.6429146Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_64x16x16x512_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4_1x1_interwave_v1.hip 2025-05-07T19:51:13.6430744Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_64x16x16x512_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4_1x1_interwave_v2.hip 2025-05-07T19:51:13.6431955Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/fp8_rowwise_grouped_gemm.hip 2025-05-07T19:51:13.6433177Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_intrawave_v2.hip 2025-05-07T19:51:13.6434652Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:13.6436144Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x256_16x16_1x1_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6437642Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6439110Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:13.6440754Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:13.6442251Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x64x256_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_interwave_v1.hip 2025-05-07T19:51:13.6443743Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x64x256_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_interwave_v2.hip 2025-05-07T19:51:13.6445299Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x64x256_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:51:13.6446790Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x96x256_16x16_1x3_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:13.6448256Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x32x16x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6449741Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x32x64x256_16x16_1x4_16x8x1_16x8x1_1x32x1x4_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:51:13.6451232Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x32x64x256_32x32_1x1_16x8x1_16x8x1_1x16x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:51:13.6452718Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x64x64x256_32x32_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:13.6454203Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x64x64x256_32x32_2x1_16x8x1_16x8x1_1x16x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6455812Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:51:13.6457355Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6459168Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x128x256_32x32_2x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6460801Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x224x128_16x16_4x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:13.6462415Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x256x128_32x32_4x2_8x32x1_8x32x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6464040Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x96x128_16x16_4x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:13.6465661Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x128x256_16x16_1x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:51:13.6467297Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x128x256_16x16_1x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:51:13.6469051Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x128x256_16x16_1x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:13.6470815Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x64x256_16x16_1x1_16x16x1_16x16x1_1x16x1x16_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6472393Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x64x256_16x16_1x1_16x16x1_16x16x1_1x16x1x16_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:13.6474079Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x64x256_16x16_1x1_16x16x1_16x16x1_1x16x1x16_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:13.6475674Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x64x512_16x16_1x1_32x8x1_32x8x1_1x16x1x16_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:13.6477234Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x192x96x128_16x16_6x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:13.6478827Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x224x256x128_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:13.6480464Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x128x64_32x32_4x2_4x64x1_4x64x1_1x32x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:51:13.6481955Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x160x128_32x32_2x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:13.6483448Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x192x128_32x32_4x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6485018Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x224x128_16x16_8x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:13.6486514Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x256x128_32x32_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6488035Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x256x128_32x32_8x2_8x32x1_8x32x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6489542Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x128x128_16x16_1x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_interwave_v2.hip 2025-05-07T19:51:13.6491038Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x160x128_16x16_1x5_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6492546Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x160x128_16x16_1x5_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:13.6494050Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x256x128_16x16_1x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:51:13.6495544Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x32x512_16x16_1x1_32x8x1_32x8x1_1x32x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6497087Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x32x512_16x16_1x1_32x8x1_32x8x1_1x32x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:13.6498880Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x64x512_16x16_2x1_32x8x1_32x8x1_1x32x1x8_8x8x1_2x1_intrawave_v2.hip 2025-05-07T19:51:13.6500495Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x64x128x256_32x32_1x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6502222Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x64x128x256_32x32_2x1_16x16x1_16x16x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:13.6503855Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x64x160x128_16x16_2x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:13.6505474Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x64x192x128_16x16_4x3_8x32x1_8x32x1_1x32x1x8_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:13.6507095Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x16x128_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6508716Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x16x256_16x16_1x1_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:13.6510386Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x16x256_16x16_1x1_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:13.6511880Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x32x256_16x16_1x2_16x4x1_16x4x1_1x16x1x4_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:51:13.6513423Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x64x256_16x16_1x4_16x4x1_16x4x1_1x16x1x4_8x8x1_1x2_interwave_v1.hip 2025-05-07T19:51:13.6514903Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x64x256_16x16_1x4_16x4x1_16x4x1_1x16x1x4_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:51:13.6516040Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_tensorwise_gemm.hip 2025-05-07T19:51:13.6516558Z 2025-05-07T19:51:13.6516734Z OTHER_SRCS: 2025-05-07T19:51:13.6516851Z 2025-05-07T19:51:13.6516941Z 2025-05-07T19:51:13.6517109Z CC_FLAGS: 2025-05-07T19:51:13.6517214Z 2025-05-07T19:51:13.6517303Z 2025-05-07T19:51:13.6517468Z NVCC_FLAGS: 2025-05-07T19:51:13.6517599Z 2025-05-07T19:51:13.6517674Z 2025-05-07T19:51:13.6517842Z HIPCC_FLAGS: 2025-05-07T19:51:13.6517970Z 2025-05-07T19:51:13.6518040Z 2025-05-07T19:51:13.6518204Z INCLUDE_DIRS: 2025-05-07T19:51:13.6518428Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:51:13.6518738Z /__w/FBGEMM/FBGEMM/fbgemm_gpu 2025-05-07T19:51:13.6518995Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include 2025-05-07T19:51:13.6519292Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:51:13.6519745Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include 2025-05-07T19:51:13.6520484Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include 2025-05-07T19:51:13.6521084Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src 2025-05-07T19:51:13.6521479Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include 2025-05-07T19:51:13.6521873Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include 2025-05-07T19:51:13.6522323Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include 2025-05-07T19:51:13.6522819Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include 2025-05-07T19:51:13.6523242Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include 2025-05-07T19:51:13.6523763Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include 2025-05-07T19:51:13.6524326Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize 2025-05-07T19:51:13.6524686Z 2025-05-07T19:51:13.6524870Z Selected Source Files: 2025-05-07T19:51:13.6525274Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/attention.cpp 2025-05-07T19:51:13.6525863Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cpp 2025-05-07T19:51:13.6526474Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cpp 2025-05-07T19:51:13.6527024Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cpp 2025-05-07T19:51:13.6527591Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cpp 2025-05-07T19:51:13.6528232Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cpp 2025-05-07T19:51:13.6528792Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cpp 2025-05-07T19:51:13.6529395Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/gqa_attn_splitk.cu 2025-05-07T19:51:13.6529984Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cu 2025-05-07T19:51:13.6530517Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu 2025-05-07T19:51:13.6531034Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cu 2025-05-07T19:51:13.6531579Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu 2025-05-07T19:51:13.6532180Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cu 2025-05-07T19:51:13.6532728Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cu 2025-05-07T19:51:13.6533381Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu 2025-05-07T19:51:13.6534219Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16.cu 2025-05-07T19:51:13.6534998Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu 2025-05-07T19:51:13.6535862Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu 2025-05-07T19:51:13.6536626Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16.cu 2025-05-07T19:51:13.6537685Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T19:51:13.6538682Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T19:51:13.6539634Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T19:51:13.6540604Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T19:51:13.6541566Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T19:51:13.6542512Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T19:51:13.6543462Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T19:51:13.6544409Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T19:51:13.6545375Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T19:51:13.6546530Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T19:51:13.6547487Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T19:51:13.6548458Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T19:51:13.6549399Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T19:51:13.6550429Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T19:51:13.6551744Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T19:51:13.6552613Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T19:51:13.6553496Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T19:51:13.6554362Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T19:51:13.6555245Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T19:51:13.6556126Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T19:51:13.6556999Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T19:51:13.6557882Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T19:51:13.6558766Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T19:51:13.6559640Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T19:51:13.6560473Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16.cu 2025-05-07T19:51:13.6561204Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu 2025-05-07T19:51:13.6561973Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu 2025-05-07T19:51:13.6562711Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_lite.cu 2025-05-07T19:51:13.6563444Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu 2025-05-07T19:51:13.6564362Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu 2025-05-07T19:51:13.6565417Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu 2025-05-07T19:51:13.6566486Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu 2025-05-07T19:51:13.6567939Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu 2025-05-07T19:51:13.6582743Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu 2025-05-07T19:51:13.6583907Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu 2025-05-07T19:51:13.6585071Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu 2025-05-07T19:51:13.6586206Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu 2025-05-07T19:51:13.6587350Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu 2025-05-07T19:51:13.6588654Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu 2025-05-07T19:51:13.6590064Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu 2025-05-07T19:51:13.6591443Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu 2025-05-07T19:51:13.6592593Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu 2025-05-07T19:51:13.6593844Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu 2025-05-07T19:51:13.6594800Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu 2025-05-07T19:51:13.6595652Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu 2025-05-07T19:51:13.6596445Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu 2025-05-07T19:51:13.6597338Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu 2025-05-07T19:51:13.6598139Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu 2025-05-07T19:51:13.6598885Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16.cu 2025-05-07T19:51:13.6599607Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu 2025-05-07T19:51:13.6600347Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/mixed_dtype_utils.cu 2025-05-07T19:51:13.6601138Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16_fast_gemv.cu 2025-05-07T19:51:13.6601816Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu 2025-05-07T19:51:13.6602518Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu 2025-05-07T19:51:13.6603212Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv.cu 2025-05-07T19:51:13.6603883Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv.cuh 2025-05-07T19:51:13.6604569Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/utility.cuh 2025-05-07T19:51:13.6605028Z 2025-05-07T19:51:13.6605202Z HIPified Source Files: 2025-05-07T19:51:13.6605342Z 2025-05-07T19:51:13.6605419Z 2025-05-07T19:51:13.6605593Z Library Dependencies: 2025-05-07T19:51:13.6605802Z torch 2025-05-07T19:51:13.6605973Z torch_library 2025-05-07T19:51:13.6606368Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libc10.so 2025-05-07T19:51:13.6606988Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so 2025-05-07T19:51:13.6607637Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libc10_cuda.so 2025-05-07T19:51:13.6608358Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib/libnccl.so.2 2025-05-07T19:51:13.6609025Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:51:13.6609581Z /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:51:13.6609943Z 2025-05-07T19:51:13.6610116Z Output Library: 2025-05-07T19:51:13.6610329Z fbgemm_gpu_experimental_gen_ai 2025-05-07T19:51:13.6610568Z 2025-05-07T19:51:13.6610739Z Destination Directory: 2025-05-07T19:51:13.6610890Z 2025-05-07T19:51:13.6610996Z ================================================================================ 2025-05-07T19:51:13.6611216Z 2025-05-07T19:51:13.6611219Z 2025-05-07T19:51:13.6611223Z 2025-05-07T19:51:13.6611329Z ================================================================================ 2025-05-07T19:51:13.6611671Z Adding to Package: fbgemm_gpu/experimental/gen_ai 2025-05-07T19:51:13.6611974Z 2025-05-07T19:51:13.6612134Z TARGETS: 2025-05-07T19:51:13.6612331Z fbgemm_gpu_experimental_gen_ai 2025-05-07T19:51:13.6612558Z 2025-05-07T19:51:13.6612722Z FILES: 2025-05-07T19:51:13.6612825Z 2025-05-07T19:51:13.6612933Z ================================================================================ 2025-05-07T19:51:13.6613198Z 2025-05-07T19:51:13.6613246Z 2025-05-07T19:51:13.6613249Z 2025-05-07T19:51:13.6613362Z ================================================================================ 2025-05-07T19:51:13.6613752Z GPU CPP Library Target: fbgemm_gpu_experimental_example_py (SHARED) 2025-05-07T19:51:13.6614112Z 2025-05-07T19:51:13.6614279Z CPU_SRCS: 2025-05-07T19:51:13.6614395Z 2025-05-07T19:51:13.6614460Z 2025-05-07T19:51:13.6614620Z GPU_SRCS: 2025-05-07T19:51:13.6614940Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_nccl.cpp 2025-05-07T19:51:13.6615449Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_ops.cpp 2025-05-07T19:51:13.6615971Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/cutlass_sgemm_nn.cu 2025-05-07T19:51:13.6616359Z 2025-05-07T19:51:13.6616527Z CUDA_SPECIFIC_SRCS: 2025-05-07T19:51:13.6616656Z 2025-05-07T19:51:13.6616730Z 2025-05-07T19:51:13.6616895Z HIP_SPECIFIC_SRCS: 2025-05-07T19:51:13.6617137Z 2025-05-07T19:51:13.6617205Z 2025-05-07T19:51:13.6617540Z OTHER_SRCS: 2025-05-07T19:51:13.6617666Z 2025-05-07T19:51:13.6617739Z 2025-05-07T19:51:13.6617908Z CC_FLAGS: 2025-05-07T19:51:13.6618099Z 2025-05-07T19:51:13.6618170Z 2025-05-07T19:51:13.6618348Z NVCC_FLAGS: 2025-05-07T19:51:13.6618463Z 2025-05-07T19:51:13.6618532Z 2025-05-07T19:51:13.6618712Z HIPCC_FLAGS: 2025-05-07T19:51:13.6618921Z 2025-05-07T19:51:13.6618995Z 2025-05-07T19:51:13.6619175Z INCLUDE_DIRS: 2025-05-07T19:51:13.6619402Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:51:13.6619713Z /__w/FBGEMM/FBGEMM/fbgemm_gpu 2025-05-07T19:51:13.6619984Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include 2025-05-07T19:51:13.6620293Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:51:13.6620774Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include 2025-05-07T19:51:13.6621555Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include 2025-05-07T19:51:13.6622199Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src 2025-05-07T19:51:13.6622606Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include 2025-05-07T19:51:13.6623036Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include 2025-05-07T19:51:13.6623496Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include 2025-05-07T19:51:13.6624015Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include 2025-05-07T19:51:13.6624466Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include 2025-05-07T19:51:13.6625014Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include 2025-05-07T19:51:13.6625509Z 2025-05-07T19:51:13.6625694Z Selected Source Files: 2025-05-07T19:51:13.6626075Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_nccl.cpp 2025-05-07T19:51:13.6626620Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_ops.cpp 2025-05-07T19:51:13.6627177Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/cutlass_sgemm_nn.cu 2025-05-07T19:51:13.6627588Z 2025-05-07T19:51:13.6627787Z HIPified Source Files: 2025-05-07T19:51:13.6627938Z 2025-05-07T19:51:13.6628029Z 2025-05-07T19:51:13.6628218Z Library Dependencies: 2025-05-07T19:51:13.6628450Z torch 2025-05-07T19:51:13.6628651Z torch_library 2025-05-07T19:51:13.6629087Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libc10.so 2025-05-07T19:51:13.6629882Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so 2025-05-07T19:51:13.6630536Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libc10_cuda.so 2025-05-07T19:51:13.6631271Z /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib/libnccl.so.2 2025-05-07T19:51:13.6631955Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:51:13.6632530Z /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:51:13.6632898Z 2025-05-07T19:51:13.6633096Z Output Library: 2025-05-07T19:51:13.6633324Z fbgemm_gpu_experimental_example_py 2025-05-07T19:51:13.6633662Z 2025-05-07T19:51:13.6633847Z Destination Directory: 2025-05-07T19:51:13.6634006Z 2025-05-07T19:51:13.6634117Z ================================================================================ 2025-05-07T19:51:13.6634335Z 2025-05-07T19:51:13.6634339Z 2025-05-07T19:51:13.6634342Z 2025-05-07T19:51:13.6634472Z ================================================================================ 2025-05-07T19:51:13.6634817Z Adding to Package: fbgemm_gpu/experimental/example 2025-05-07T19:51:13.6635145Z 2025-05-07T19:51:13.6635309Z TARGETS: 2025-05-07T19:51:13.6635536Z fbgemm_gpu_experimental_example_py 2025-05-07T19:51:13.6635789Z 2025-05-07T19:51:13.6635976Z FILES: 2025-05-07T19:51:13.6636294Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/example/__init__.py 2025-05-07T19:51:13.6636782Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/example/utils.py 2025-05-07T19:51:13.6637173Z ================================================================================ 2025-05-07T19:51:13.6637384Z 2025-05-07T19:51:13.6637388Z 2025-05-07T19:51:13.6637392Z 2025-05-07T19:51:13.6637492Z ================================================================================ 2025-05-07T19:51:13.6637867Z Adding to Package: fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T19:51:13.6638204Z 2025-05-07T19:51:13.6638404Z TARGETS: 2025-05-07T19:51:13.6638505Z 2025-05-07T19:51:13.6638587Z 2025-05-07T19:51:13.6638737Z FILES: 2025-05-07T19:51:13.6639051Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gemm/triton_gemm/__init__.py 2025-05-07T19:51:13.6639561Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gemm/triton_gemm/fp8_gemm.py 2025-05-07T19:51:13.6640102Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gemm/triton_gemm/grouped_gemm.py 2025-05-07T19:51:13.6640667Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gemm/triton_gemm/matmul_perf_model.py 2025-05-07T19:51:13.6641209Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gemm/triton_gemm/utils.py 2025-05-07T19:51:13.6641629Z ================================================================================ 2025-05-07T19:51:13.6641838Z 2025-05-07T19:51:13.6641930Z -- Configuring done (9.0s) 2025-05-07T19:51:13.6642195Z -- Generating done (0.0s) 2025-05-07T19:51:13.6642648Z -- Build files have been written to: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-build 2025-05-07T19:51:13.6815357Z Change Dir: '/__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-build' 2025-05-07T19:51:13.6816508Z 2025-05-07T19:51:13.6817636Z Run Build Command(s): /github/home/miniconda/envs/build_binary/bin/ninja -v -j 48 install 2025-05-07T19:51:13.7920844Z [1/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/errorhandler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/errorhandler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/errorhandler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/errorhandler.cpp 2025-05-07T19:51:13.7932405Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:13.8147856Z [2/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/support.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/support.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/support.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/support.cpp 2025-05-07T19:51:13.8159922Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:13.8197428Z [3/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/globals.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/globals.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/globals.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/globals.cpp 2025-05-07T19:51:13.8209299Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:13.8241966Z [4/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonelist.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonelist.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonelist.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonelist.cpp 2025-05-07T19:51:13.8254049Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:13.8265502Z [5/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonetree.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonetree.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonetree.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonetree.cpp 2025-05-07T19:51:13.8277413Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:13.8391203Z [6/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonestack.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonestack.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonestack.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonestack.cpp 2025-05-07T19:51:13.8402307Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:13.8413597Z [7/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/osutils.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/osutils.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/osutils.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/osutils.cpp 2025-05-07T19:51:13.8425265Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:13.8472014Z [8/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/operand.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/operand.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/operand.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/operand.cpp 2025-05-07T19:51:13.8483915Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:13.8494594Z [9/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonehash.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonehash.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonehash.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonehash.cpp 2025-05-07T19:51:13.8505630Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:13.8653713Z [10/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/environment.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/environment.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/environment.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/environment.cpp 2025-05-07T19:51:13.8666040Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:13.8788068Z [11/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zone.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zone.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zone.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zone.cpp 2025-05-07T19:51:13.8803136Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:13.8955973Z [12/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/string.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/string.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/string.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/string.cpp 2025-05-07T19:51:13.8967246Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:13.9084099Z [13/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/constpool.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/constpool.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/constpool.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/constpool.cpp 2025-05-07T19:51:13.9096324Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:13.9194639Z [14/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/target.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/target.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/target.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/target.cpp 2025-05-07T19:51:13.9206956Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:13.9218584Z [15/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/inst.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/inst.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/inst.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/inst.cpp 2025-05-07T19:51:13.9230392Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:13.9242480Z [16/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/type.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/type.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/type.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/type.cpp 2025-05-07T19:51:13.9254305Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:13.9349712Z [17/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/logger.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/logger.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/logger.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/logger.cpp 2025-05-07T19:51:13.9361804Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:13.9397401Z [18/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/instdb.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/instdb.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/instdb.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/instdb.cpp 2025-05-07T19:51:13.9408870Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:13.9559885Z [19/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64operand.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64operand.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64operand.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64operand.cpp 2025-05-07T19:51:13.9571836Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:13.9786276Z [20/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitruntime.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitruntime.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitruntime.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitruntime.cpp 2025-05-07T19:51:13.9798379Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:13.9826918Z [21/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonevector.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonevector.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonevector.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonevector.cpp 2025-05-07T19:51:13.9838530Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.0150646Z [22/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rastack.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rastack.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rastack.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rastack.cpp 2025-05-07T19:51:14.0162518Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.0174292Z [23/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitterutils.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitterutils.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitterutils.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitterutils.cpp 2025-05-07T19:51:14.0186548Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.0216026Z [24/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codewriter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codewriter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codewriter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codewriter.cpp 2025-05-07T19:51:14.0227461Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.0257939Z [25/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64func.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64func.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64func.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64func.cpp 2025-05-07T19:51:14.0269669Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.0281059Z [26/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/func.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/func.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/func.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/func.cpp 2025-05-07T19:51:14.0292415Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.0581592Z [27/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/funcargscontext.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/funcargscontext.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/funcargscontext.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/funcargscontext.cpp 2025-05-07T19:51:14.0594283Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.0605491Z [28/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instapi.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instapi.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instapi.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instapi.cpp 2025-05-07T19:51:14.0616877Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.0725154Z [29/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instdb.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instdb.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instdb.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instdb.cpp 2025-05-07T19:51:14.0736784Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.0846617Z [30/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/assembler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/assembler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/assembler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/assembler.cpp 2025-05-07T19:51:14.0858469Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.0906187Z [31/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emithelper.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emithelper.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emithelper.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emithelper.cpp 2025-05-07T19:51:14.0917717Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.1017104Z [32/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64formatter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64formatter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64formatter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64formatter.cpp 2025-05-07T19:51:14.1029183Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.1149163Z [33/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/archtraits.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/archtraits.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/archtraits.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/archtraits.cpp 2025-05-07T19:51:14.1160713Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.1336788Z [34/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/armformatter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/armformatter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/armformatter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/armformatter.cpp 2025-05-07T19:51:14.1348497Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.1528785Z [35/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitallocator.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitallocator.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitallocator.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitallocator.cpp 2025-05-07T19:51:14.1540222Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.1651212Z [36/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/formatter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/formatter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/formatter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/formatter.cpp 2025-05-07T19:51:14.1662570Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.1786239Z [37/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64compiler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64compiler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64compiler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64compiler.cpp 2025-05-07T19:51:14.1797882Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.1808964Z [38/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86operand.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86operand.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86operand.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86operand.cpp 2025-05-07T19:51:14.1820761Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.2063518Z [39/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64emithelper.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64emithelper.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64emithelper.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64emithelper.cpp 2025-05-07T19:51:14.2076212Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.2199123Z [40/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/compiler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/compiler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/compiler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/compiler.cpp 2025-05-07T19:51:14.2210711Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.2299066Z [41/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/virtmem.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/virtmem.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/virtmem.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/virtmem.cpp 2025-05-07T19:51:14.2309821Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.2460574Z [42/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/builder.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/builder.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/builder.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/builder.cpp 2025-05-07T19:51:14.2473569Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.2485821Z [43/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitter.cpp 2025-05-07T19:51:14.2498571Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.3097685Z [44/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64builder.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64builder.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64builder.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64builder.cpp 2025-05-07T19:51:14.3110175Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.3170681Z [45/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/cpuinfo.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/cpuinfo.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/cpuinfo.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/cpuinfo.cpp 2025-05-07T19:51:14.3182193Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.3643464Z [46/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86builder.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86builder.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86builder.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86builder.cpp 2025-05-07T19:51:14.3656113Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.3855382Z [47/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86formatter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86formatter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86formatter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86formatter.cpp 2025-05-07T19:51:14.3867875Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.4125319Z [48/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86func.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86func.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86func.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86func.cpp 2025-05-07T19:51:14.4136427Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.4395153Z [49/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/ralocal.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/ralocal.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/ralocal.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/ralocal.cpp 2025-05-07T19:51:14.4406241Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.4508983Z [50/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86compiler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86compiler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86compiler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86compiler.cpp 2025-05-07T19:51:14.4515633Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.4817300Z [51/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86emithelper.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86emithelper.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86emithelper.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86emithelper.cpp 2025-05-07T19:51:14.4829236Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.5426305Z [52/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instapi.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instapi.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instapi.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instapi.cpp 2025-05-07T19:51:14.5438289Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.5906854Z [53/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64rapass.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64rapass.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64rapass.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64rapass.cpp 2025-05-07T19:51:14.5919436Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.7256371Z [54/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codeholder.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codeholder.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codeholder.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codeholder.cpp 2025-05-07T19:51:14.7269515Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.8587310Z [55/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instdb.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instdb.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instdb.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instdb.cpp 2025-05-07T19:51:14.8599550Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.8998564Z [56/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -mavx512f -mavx512bw -mavx512dq -mavx512vl -fopenmp -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx512.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx512.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx512.cc.o -c /__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx512.cc 2025-05-07T19:51:14.9018029Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.9175517Z [57/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx2.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx2.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx2.cc.o -c /__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx2.cc 2025-05-07T19:51:14.9194632Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:14.9507416Z [58/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rapass.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rapass.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rapass.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rapass.cpp 2025-05-07T19:51:14.9520068Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:15.0723577Z [59/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86rapass.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86rapass.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86rapass.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86rapass.cpp 2025-05-07T19:51:15.0736457Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:15.1423611Z [60/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64assembler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64assembler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64assembler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64assembler.cpp 2025-05-07T19:51:15.1435591Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:15.2412925Z [61/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtils.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtils.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtils.cc.o -c /__w/FBGEMM/FBGEMM/src/QuantUtils.cc 2025-05-07T19:51:15.2431417Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:15.4663213Z [62/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86assembler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86assembler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86assembler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86assembler.cpp 2025-05-07T19:51:15.4676258Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:16.0970527Z [63/156] : && /github/home/miniconda/envs/build_binary/bin/c++ -fPIC -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/github/home/miniconda/envs/build_binary/lib -Wl,-rpath-link,/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs -s -shared -Wl,-soname,asmjit.so -o asmjit.so CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64assembler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64builder.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64compiler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64emithelper.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64formatter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64func.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instapi.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instdb.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64operand.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64rapass.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/armformatter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/archtraits.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/assembler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/builder.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codeholder.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codewriter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/compiler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/constpool.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/cpuinfo.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emithelper.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitterutils.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/environment.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/errorhandler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/formatter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/func.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/funcargscontext.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/globals.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/inst.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/instdb.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitallocator.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitruntime.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/logger.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/operand.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/osutils.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/ralocal.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rapass.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rastack.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/string.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/support.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/target.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/type.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/virtmem.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zone.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonehash.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonelist.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonestack.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonetree.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonevector.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86assembler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86builder.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86compiler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86emithelper.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86formatter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86func.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instapi.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instdb.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86operand.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86rapass.cpp.o -L/lib/intel64 -L/lib/intel64_win -L/lib/win-x64 -Wl,-rpath,/lib/intel64:/lib/intel64_win:/lib/win-x64:/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib:/github/home/miniconda/envs/build_binary/lib/stubs: /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libtorch.so /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib/libnccl.so.2 /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so" -Wl,--as-needed -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so" -Wl,--as-needed /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libcudart.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libtorch.so" -Wl,--as-needed && : 2025-05-07T19:51:16.1042402Z [64/156] cd /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-build && bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/../.github/scripts/fbgemm_gpu_postbuild.bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-build/asmjit.so 2025-05-07T19:51:16.1044614Z ################################################################################ 2025-05-07T19:51:16.1045253Z [CMAKE] Running post-build script ... 2025-05-07T19:51:16.1046189Z Target file: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-build/asmjit.so 2025-05-07T19:51:16.1047123Z Removing all RPATHs ... 2025-05-07T19:51:16.1047663Z ################################################################################ 2025-05-07T19:51:16.3354570Z [65/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/Utils.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/Utils.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/Utils.cc.o -c /__w/FBGEMM/FBGEMM/src/Utils.cc 2025-05-07T19:51:16.3371027Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:16.7644671Z [66/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/SparseAdagrad.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/SparseAdagrad.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/SparseAdagrad.cc.o -c /__w/FBGEMM/FBGEMM/src/SparseAdagrad.cc 2025-05-07T19:51:16.7663498Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:20.0659034Z [67/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RefImplementations.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RefImplementations.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RefImplementations.cc.o -c /__w/FBGEMM/FBGEMM/src/RefImplementations.cc 2025-05-07T19:51:20.0676070Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:20.3962902Z [68/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RowWiseSparseAdagradFused.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RowWiseSparseAdagradFused.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RowWiseSparseAdagradFused.cc.o -c /__w/FBGEMM/FBGEMM/src/RowWiseSparseAdagradFused.cc 2025-05-07T19:51:20.3980409Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:22.3402356Z [69/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAutovec.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAutovec.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAutovec.cc.o -c /__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAutovec.cc 2025-05-07T19:51:22.3421933Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:22.6884005Z [70/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cpp 2025-05-07T19:51:22.6903052Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:22.7807929Z [71/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cpp 2025-05-07T19:51:22.7827769Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:22.8222692Z [72/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/attention.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/attention.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/attention.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/attention.cpp 2025-05-07T19:51:22.8241753Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:22.9545721Z [73/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cpp 2025-05-07T19:51:22.9564481Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.4403795Z [74/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cpp 2025-05-07T19:51:24.4422200Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.9151282Z [75/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cpp 2025-05-07T19:51:24.9170676Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:25.1063162Z [76/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cpp 2025-05-07T19:51:25.1082911Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:25.8491769Z [77/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtilsAvx2.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtilsAvx2.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtilsAvx2.cc.o -c /__w/FBGEMM/FBGEMM/src/QuantUtilsAvx2.cc 2025-05-07T19:51:25.8509327Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:39.3802779Z [78/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMNBit.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMNBit.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMNBit.cc.o -c /__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMNBit.cc 2025-05-07T19:51:39.3821303Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:52:19.8997880Z [79/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16.cu.o 2025-05-07T19:52:19.9022216Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:52:36.6127507Z [80/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu.o 2025-05-07T19:52:36.6250667Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:56:43.3172825Z [81/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cu.o 2025-05-07T19:56:43.3304624Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:56:43.3316876Z [82/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu.o 2025-05-07T19:56:43.3329001Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:56:43.3338106Z [83/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDM.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDM.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDM.cc.o -c /__w/FBGEMM/FBGEMM/src/EmbeddingSpMDM.cc 2025-05-07T19:56:43.3347530Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:56:50.3435906Z [84/156] : && /github/home/miniconda/envs/build_binary/bin/c++ -fPIC -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/github/home/miniconda/envs/build_binary/lib -Wl,-rpath-link,/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs -s -shared -Wl,-soname,fbgemm.so -o fbgemm.so CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDM.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAutovec.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMNBit.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtils.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RefImplementations.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RowWiseSparseAdagradFused.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/SparseAdagrad.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/Utils.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx2.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtilsAvx2.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx512.cc.o -L/lib/intel64 -L/lib/intel64_win -L/lib/win-x64 -Wl,-rpath,"\$ORIGIN" /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib/libnccl.so.2 /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so asmjit.so /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libtorch.so" -Wl,--as-needed /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libtorch.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so" -Wl,--as-needed -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so" -Wl,--as-needed /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libcudart.so && : 2025-05-07T19:56:51.5732159Z [85/156] cd /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-build && bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/../.github/scripts/fbgemm_gpu_postbuild.bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-build/fbgemm.so 1 2025-05-07T19:56:51.5733393Z ################################################################################ 2025-05-07T19:56:51.5733757Z [CMAKE] Running post-build script ... 2025-05-07T19:56:51.5734277Z Target file: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-build/fbgemm.so 2025-05-07T19:56:51.5734801Z Resetting RPATH to $ORIGIN ... 2025-05-07T19:56:51.5735190Z 0x000000000000000f (RPATH) Library rpath: [$ORIGIN] 2025-05-07T19:56:51.5735601Z ################################################################################ 2025-05-07T19:56:56.1706113Z [86/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_lite.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_lite.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_lite.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_lite.cu.o 2025-05-07T19:56:56.1727911Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:56:56.4267023Z [87/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/gqa_attn_splitk.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/gqa_attn_splitk.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/gqa_attn_splitk.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/gqa_attn_splitk.cu.o 2025-05-07T19:56:56.4288958Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:11.5667786Z [88/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cu.o 2025-05-07T19:57:11.5680142Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:11.5681773Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:11.5682771Z static auto dtype() { 2025-05-07T19:57:11.5683068Z ^ 2025-05-07T19:57:11.5683215Z 2025-05-07T19:57:11.5683498Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:11.5683873Z 2025-05-07T19:57:11.5684716Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:11.5685737Z static auto dtype() { 2025-05-07T19:57:11.5686045Z ^ 2025-05-07T19:57:11.5686191Z 2025-05-07T19:57:11.5687021Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:11.5688105Z static auto dtype() { 2025-05-07T19:57:11.5688469Z ^ 2025-05-07T19:57:11.5688646Z 2025-05-07T19:57:11.5689429Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:11.5690465Z static auto dtype() { 2025-05-07T19:57:11.5690740Z ^ 2025-05-07T19:57:11.5690909Z 2025-05-07T19:57:11.5691163Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:11.5691534Z 2025-05-07T19:57:11.5692310Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:11.5693327Z static auto dtype() { 2025-05-07T19:57:11.5693617Z ^ 2025-05-07T19:57:11.5693755Z 2025-05-07T19:57:11.5694580Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:11.5695662Z static auto dtype() { 2025-05-07T19:57:11.5695930Z ^ 2025-05-07T19:57:11.5696098Z 2025-05-07T19:57:11.5696938Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:11.5698058Z static auto dtype() { 2025-05-07T19:57:11.5698329Z ^ 2025-05-07T19:57:11.5698507Z 2025-05-07T19:57:11.5698766Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:11.5699136Z 2025-05-07T19:57:11.5699929Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:11.5700939Z static auto dtype() { 2025-05-07T19:57:11.5701240Z ^ 2025-05-07T19:57:11.5701380Z 2025-05-07T19:57:11.5702267Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:11.5703330Z static auto dtype() { 2025-05-07T19:57:11.5703596Z ^ 2025-05-07T19:57:11.5703773Z 2025-05-07T19:57:11.5704553Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:11.5705578Z static auto dtype() { 2025-05-07T19:57:11.5705844Z ^ 2025-05-07T19:57:11.5706019Z 2025-05-07T19:57:11.5706268Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:11.5706639Z 2025-05-07T19:57:11.5707416Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:11.5708436Z static auto dtype() { 2025-05-07T19:57:11.5708730Z ^ 2025-05-07T19:57:11.5708869Z 2025-05-07T19:57:11.5709691Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:11.5710769Z static auto dtype() { 2025-05-07T19:57:11.5711037Z ^ 2025-05-07T19:57:11.5711204Z 2025-05-07T19:57:11.5711984Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:11.5713072Z static auto dtype() { 2025-05-07T19:57:11.5713330Z ^ 2025-05-07T19:57:11.5713507Z 2025-05-07T19:57:11.5713761Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:11.5714129Z 2025-05-07T19:57:11.5714932Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:11.5715923Z static auto dtype() { 2025-05-07T19:57:11.5716240Z ^ 2025-05-07T19:57:11.5716379Z 2025-05-07T19:57:11.5717211Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:11.5718277Z static auto dtype() { 2025-05-07T19:57:11.5718530Z ^ 2025-05-07T19:57:11.5718701Z 2025-05-07T19:57:11.5719476Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:11.5720478Z static auto dtype() { 2025-05-07T19:57:11.5720735Z ^ 2025-05-07T19:57:11.5720980Z 2025-05-07T19:57:11.5721234Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:11.5721599Z 2025-05-07T19:57:11.5722405Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:11.5723386Z static auto dtype() { 2025-05-07T19:57:11.5723682Z ^ 2025-05-07T19:57:11.5723833Z 2025-05-07T19:57:11.5724673Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:11.5725735Z static auto dtype() { 2025-05-07T19:57:11.5725992Z ^ 2025-05-07T19:57:11.5726160Z 2025-05-07T19:57:43.3880020Z [89/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16.cu.o 2025-05-07T19:57:43.3904095Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:43.3907055Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:43.3909378Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:57:43.3910577Z ^ 2025-05-07T19:57:43.3910912Z 2025-05-07T19:57:43.3911399Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:43.3912126Z 2025-05-07T19:57:43.3913820Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:43.3916160Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:57:43.3917074Z ^ 2025-05-07T19:57:43.3917352Z 2025-05-07T19:57:47.0840695Z [90/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cu.o 2025-05-07T19:57:47.0861249Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:53.1253808Z [91/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu.o 2025-05-07T19:57:53.1278038Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:53.1281024Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:53.1283138Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:57:53.1283861Z ^ 2025-05-07T19:57:53.1284126Z 2025-05-07T19:57:53.1284863Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:53.1285536Z 2025-05-07T19:57:53.1287061Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:53.1289303Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:57:53.1290168Z ^ 2025-05-07T19:57:53.1290484Z 2025-05-07T19:57:54.7607591Z [92/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cu.o 2025-05-07T19:57:54.7630277Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:57.9587698Z [93/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cu.o 2025-05-07T19:57:57.9603343Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:57.9605594Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __host__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:57:57.9607495Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:57:57.9608319Z ^ 2025-05-07T19:57:57.9608563Z 2025-05-07T19:57:57.9608886Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:57.9609363Z 2025-05-07T19:57:57.9610551Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __device__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:57:57.9612450Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:57:57.9613328Z ^ 2025-05-07T19:57:57.9613595Z 2025-05-07T19:57:57.9614801Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __host__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:57:57.9616629Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:57:57.9617683Z ^ 2025-05-07T19:57:57.9617870Z 2025-05-07T19:57:57.9618185Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:57.9618678Z 2025-05-07T19:57:57.9619815Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __device__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:57:57.9621907Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:57:57.9622751Z ^ 2025-05-07T19:57:57.9623052Z 2025-05-07T19:57:57.9623946Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(147): warning #177-D: variable "fbgemm_gpu::CVT_FP4_ELTS_PER_THREAD" was declared but never referenced 2025-05-07T19:57:57.9625138Z constexpr int CVT_FP4_ELTS_PER_THREAD = 8; 2025-05-07T19:57:57.9625547Z ^ 2025-05-07T19:57:57.9625721Z 2025-05-07T19:57:57.9626577Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(148): warning #177-D: variable "fbgemm_gpu::CVT_FP4_SF_VEC_SIZE" was declared but never referenced 2025-05-07T19:57:57.9627685Z constexpr int CVT_FP4_SF_VEC_SIZE = 16; 2025-05-07T19:57:57.9628126Z ^ 2025-05-07T19:57:57.9628295Z 2025-05-07T19:57:57.9629472Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __host__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:57:57.9631329Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:57:57.9632201Z ^ 2025-05-07T19:57:57.9632390Z 2025-05-07T19:57:57.9632909Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:57.9633413Z 2025-05-07T19:57:57.9634580Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __device__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:57:57.9636466Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:57:57.9637310Z ^ 2025-05-07T19:57:57.9637607Z 2025-05-07T19:57:57.9638490Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(147): warning #177-D: variable "fbgemm_gpu::CVT_FP4_ELTS_PER_THREAD" was declared but never referenced 2025-05-07T19:57:57.9639691Z constexpr int CVT_FP4_ELTS_PER_THREAD = 8; 2025-05-07T19:57:57.9640118Z ^ 2025-05-07T19:57:57.9640299Z 2025-05-07T19:57:57.9641183Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(148): warning #177-D: variable "fbgemm_gpu::CVT_FP4_SF_VEC_SIZE" was declared but never referenced 2025-05-07T19:57:57.9642300Z constexpr int CVT_FP4_SF_VEC_SIZE = 16; 2025-05-07T19:57:57.9642731Z ^ 2025-05-07T19:57:57.9642907Z 2025-05-07T19:57:57.9644073Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __host__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:57:57.9645947Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:57:57.9646794Z ^ 2025-05-07T19:57:57.9646984Z 2025-05-07T19:57:57.9647300Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:57.9647807Z 2025-05-07T19:57:57.9648981Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __device__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:57:57.9650904Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:57:57.9651747Z ^ 2025-05-07T19:57:57.9652045Z 2025-05-07T19:57:57.9653078Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(147): warning #177-D: variable "fbgemm_gpu::CVT_FP4_ELTS_PER_THREAD" was declared but never referenced 2025-05-07T19:57:57.9654266Z constexpr int CVT_FP4_ELTS_PER_THREAD = 8; 2025-05-07T19:57:57.9654688Z ^ 2025-05-07T19:57:57.9654860Z 2025-05-07T19:57:57.9655719Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(148): warning #177-D: variable "fbgemm_gpu::CVT_FP4_SF_VEC_SIZE" was declared but never referenced 2025-05-07T19:57:57.9656842Z constexpr int CVT_FP4_SF_VEC_SIZE = 16; 2025-05-07T19:57:57.9657420Z ^ 2025-05-07T19:57:57.9657596Z 2025-05-07T19:57:57.9658787Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __host__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:57:57.9660626Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:57:57.9661503Z ^ 2025-05-07T19:57:57.9661686Z 2025-05-07T19:57:57.9662022Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:57.9662538Z 2025-05-07T19:57:57.9663694Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __device__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:57:57.9665782Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:57:57.9666625Z ^ 2025-05-07T19:57:57.9666923Z 2025-05-07T19:57:57.9668029Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(147): warning #177-D: variable "fbgemm_gpu::CVT_FP4_ELTS_PER_THREAD" was declared but never referenced 2025-05-07T19:57:57.9669218Z constexpr int CVT_FP4_ELTS_PER_THREAD = 8; 2025-05-07T19:57:57.9669618Z ^ 2025-05-07T19:57:57.9669793Z 2025-05-07T19:57:57.9670667Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(148): warning #177-D: variable "fbgemm_gpu::CVT_FP4_SF_VEC_SIZE" was declared but never referenced 2025-05-07T19:57:57.9671778Z constexpr int CVT_FP4_SF_VEC_SIZE = 16; 2025-05-07T19:57:57.9672203Z ^ 2025-05-07T19:57:57.9672384Z 2025-05-07T19:57:57.9673552Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __host__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:57:57.9675390Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:57:57.9676239Z ^ 2025-05-07T19:57:57.9676423Z 2025-05-07T19:57:57.9676736Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:57.9677240Z 2025-05-07T19:57:57.9678393Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __device__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:57:57.9680289Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:57:57.9681123Z ^ 2025-05-07T19:57:57.9681379Z 2025-05-07T19:57:57.9682579Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __host__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:57:57.9684458Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:57:57.9685329Z ^ 2025-05-07T19:57:57.9685513Z 2025-05-07T19:57:57.9686111Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:57.9686601Z 2025-05-07T19:57:57.9687753Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __device__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:57:57.9689677Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:57:57.9690511Z ^ 2025-05-07T19:57:57.9690820Z 2025-05-07T19:57:57.9691574Z ptxas warning : Value of threads per SM for entry _ZN10fbgemm_gpu15cvt_fp16_to_fp4I13__nv_bfloat16Lb0EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-05-07T19:57:57.9693241Z ptxas warning : Value of threads per SM for entry _ZN10fbgemm_gpu15cvt_fp16_to_fp4I13__nv_bfloat16Lb1EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-05-07T19:57:57.9694842Z ptxas warning : Value of threads per SM for entry _ZN10fbgemm_gpu15cvt_fp16_to_fp4I6__halfLb0EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-05-07T19:57:57.9696437Z ptxas warning : Value of threads per SM for entry _ZN10fbgemm_gpu15cvt_fp16_to_fp4I6__halfLb1EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-05-07T19:58:12.2809237Z [94/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu.o 2025-05-07T19:58:12.2829068Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:12.2831443Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:12.2833137Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:12.2833816Z ^ 2025-05-07T19:58:12.2834071Z 2025-05-07T19:58:12.2834398Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:12.2834926Z 2025-05-07T19:58:12.2836047Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:12.2837711Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:12.2838349Z ^ 2025-05-07T19:58:12.2838587Z 2025-05-07T19:58:13.5822330Z [95/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cu.o 2025-05-07T19:58:13.5839641Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:22.5548422Z [96/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu.o 2025-05-07T19:58:22.5571625Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:22.5574418Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:22.5576344Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:22.5577144Z ^ 2025-05-07T19:58:22.5577609Z 2025-05-07T19:58:22.5578020Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:22.5578638Z 2025-05-07T19:58:22.5580076Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:22.5582207Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:22.5582973Z ^ 2025-05-07T19:58:22.5583255Z 2025-05-07T19:58:28.5982034Z [97/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu.o 2025-05-07T19:58:28.6006520Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:28.6009428Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:28.6011624Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:28.6012443Z ^ 2025-05-07T19:58:28.6012810Z 2025-05-07T19:58:28.6013260Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:28.6013912Z 2025-05-07T19:58:28.6015501Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:28.6017868Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:28.6018672Z ^ 2025-05-07T19:58:28.6018949Z 2025-05-07T19:58:43.8000147Z [98/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu.o 2025-05-07T19:58:43.8022551Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:43.8025252Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:43.8027194Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:43.8027929Z ^ 2025-05-07T19:58:43.8028245Z 2025-05-07T19:58:43.8028654Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:43.8029280Z 2025-05-07T19:58:43.8030701Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:43.8032642Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:43.8033420Z ^ 2025-05-07T19:58:43.8033707Z 2025-05-07T19:58:44.9069653Z [99/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu.o 2025-05-07T19:58:44.9091363Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:44.9094084Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:44.9096109Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:44.9096819Z ^ 2025-05-07T19:58:44.9097113Z 2025-05-07T19:58:44.9097613Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:44.9098171Z 2025-05-07T19:58:44.9099565Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:44.9101436Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:44.9102147Z ^ 2025-05-07T19:58:44.9102409Z 2025-05-07T19:58:45.9755735Z [100/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu.o 2025-05-07T19:58:46.0234528Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:46.0237226Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:46.0239300Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:46.0240056Z ^ 2025-05-07T19:58:46.0240364Z 2025-05-07T19:58:46.0240772Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:46.0241396Z 2025-05-07T19:58:46.0242822Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:46.0244835Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:46.0245595Z ^ 2025-05-07T19:58:46.0245880Z 2025-05-07T19:58:47.3056442Z [101/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu.o 2025-05-07T19:58:47.3079266Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:47.3081978Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:47.3084062Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:47.3084842Z ^ 2025-05-07T19:58:47.3085146Z 2025-05-07T19:58:47.3085557Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:47.3086121Z 2025-05-07T19:58:47.3087506Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:47.3089553Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:47.3090303Z ^ 2025-05-07T19:58:47.3090588Z 2025-05-07T19:58:48.1849490Z [102/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_example_py_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_nccl.cpp.o -MF experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_nccl.cpp.o.d -o experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_nccl.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_nccl.cpp 2025-05-07T19:58:48.1868181Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:58:48.2656835Z [103/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu.o 2025-05-07T19:58:48.2681396Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:48.2684326Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:48.2686534Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:48.2687568Z ^ 2025-05-07T19:58:48.2687862Z 2025-05-07T19:58:48.2688270Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:48.2688821Z 2025-05-07T19:58:48.2690276Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:48.2692258Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:48.2692977Z ^ 2025-05-07T19:58:48.2693237Z 2025-05-07T19:58:55.9780819Z [104/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu.o 2025-05-07T19:58:55.9804211Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:55.9807021Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:55.9809060Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:55.9809848Z ^ 2025-05-07T19:58:55.9810555Z 2025-05-07T19:58:55.9810966Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:55.9811615Z 2025-05-07T19:58:55.9813086Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:55.9815179Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:55.9815980Z ^ 2025-05-07T19:58:55.9816275Z 2025-05-07T19:58:55.9828414Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_10multipliesES1N_fLNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SV_SV_EEELi4ELb1EEENS1P_INS1Q_IS1R_ffLS1S_2EvEEJNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEES11_NS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S25_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES29_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:58:55.9853631Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1G_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1N_NS1G_6fusion15Sm90TreeVisitorINS1P_11Sm90ComputeINS_10multipliesES1O_fLNS_15FloatRoundStyleE2EvEEJNS1P_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SW_SW_EEELi4ELb1EEENS1Q_INS1R_IS1S_ffLS1T_2EvEEJNS1P_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1P_12Sm90AccFetchEEEEEEES12_NS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S26_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2A_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:58:55.9878862Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_4plusES1N_fLNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SV_SV_EEELi4ELb1EEENS1P_INS1Q_INS_10multipliesEffLS1S_2EvEEJS1W_NS1P_IS1Y_JNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEEEEES11_NS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S27_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2B_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:58:55.9904626Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1G_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1N_NS1G_6fusion15Sm90TreeVisitorINS1P_11Sm90ComputeINS_4plusES1O_fLNS_15FloatRoundStyleE2EvEEJNS1P_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SW_SW_EEELi4ELb1EEENS1Q_INS1R_INS_10multipliesEffLS1T_2EvEEJS1X_NS1Q_IS1Z_JNS1P_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1P_12Sm90AccFetchEEEEEEEEEES12_NS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S28_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2C_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:58:55.9930028Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_4plusES1N_S1N_LNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_S1N_S1N_NS5_IJSC_SV_SV_EEELi8ELb1EEENS1P_INS1Q_INS_10multipliesES1N_fLS1S_2EvEEJNS1U_ILi0ESI_ffS1V_Li4ELb1EEENS1P_INS1Q_IS1X_ffLS1S_2EvEEJNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEEEEES11_NS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S29_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2D_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:58:55.9955994Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1G_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1N_NS1G_6fusion15Sm90TreeVisitorINS1P_11Sm90ComputeINS_4plusES1O_S1O_LNS_15FloatRoundStyleE2EvEEJNS1P_16Sm90ColBroadcastILi0ESI_S1O_S1O_NS5_IJSC_SW_SW_EEELi8ELb1EEENS1Q_INS1R_INS_10multipliesES1O_fLS1T_2EvEEJNS1V_ILi0ESI_ffS1W_Li4ELb1EEENS1Q_INS1R_IS1Y_ffLS1T_2EvEEJNS1P_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1P_12Sm90AccFetchEEEEEEEEEES12_NS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S2A_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2E_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:58:56.5946035Z [105/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu.o 2025-05-07T19:58:56.5968168Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:56.5970879Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:56.5972755Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:56.5973501Z ^ 2025-05-07T19:58:56.5973775Z 2025-05-07T19:58:56.5974186Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:56.5974786Z 2025-05-07T19:58:56.5976198Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:56.5978281Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:56.5978971Z ^ 2025-05-07T19:58:56.5979256Z 2025-05-07T19:58:56.5990760Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES11_S1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1E_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1L_NS1E_6fusion15Sm90TreeVisitorINS1N_11Sm90ComputeINS_10multipliesES1M_fLNS_15FloatRoundStyleE2EvEEJNS1N_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SV_SV_EEELi4ELb1EEENS1O_INS1P_IS1Q_ffLS1R_2EvEEJNS1N_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1N_12Sm90AccFetchEEEEEEENS4_13SM90_TMA_LOADENS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S25_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES29_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:58:56.6013422Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES12_S1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_10multipliesES1N_fLNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SW_SW_EEELi4ELb1EEENS1P_INS1Q_IS1R_ffLS1S_2EvEEJNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEENS4_13SM90_TMA_LOADENS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S26_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2A_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:58:56.6036782Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES11_S1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1E_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1L_NS1E_6fusion15Sm90TreeVisitorINS1N_11Sm90ComputeINS_4plusES1M_fLNS_15FloatRoundStyleE2EvEEJNS1N_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SV_SV_EEELi4ELb1EEENS1O_INS1P_INS_10multipliesEffLS1R_2EvEEJS1V_NS1O_IS1X_JNS1N_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1N_12Sm90AccFetchEEEEEEEEEENS4_13SM90_TMA_LOADENS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S27_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2B_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:58:56.6060099Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES12_S1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_4plusES1N_fLNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SW_SW_EEELi4ELb1EEENS1P_INS1Q_INS_10multipliesEffLS1S_2EvEEJS1W_NS1P_IS1Y_JNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEEEEENS4_13SM90_TMA_LOADENS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S28_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2C_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:58:56.6083650Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES11_S1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1E_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1L_NS1E_6fusion15Sm90TreeVisitorINS1N_11Sm90ComputeINS_4plusES1M_S1M_LNS_15FloatRoundStyleE2EvEEJNS1N_16Sm90ColBroadcastILi0ESI_S1M_S1M_NS5_IJSC_SV_SV_EEELi8ELb1EEENS1O_INS1P_INS_10multipliesES1M_fLS1R_2EvEEJNS1T_ILi0ESI_ffS1U_Li4ELb1EEENS1O_INS1P_IS1W_ffLS1R_2EvEEJNS1N_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1N_12Sm90AccFetchEEEEEEEEEENS4_13SM90_TMA_LOADENS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S29_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2D_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:58:56.6107715Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES12_S1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_4plusES1N_S1N_LNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_S1N_S1N_NS5_IJSC_SW_SW_EEELi8ELb1EEENS1P_INS1Q_INS_10multipliesES1N_fLS1S_2EvEEJNS1U_ILi0ESI_ffS1V_Li4ELb1EEENS1P_INS1Q_IS1X_ffLS1S_2EvEEJNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEEEEENS4_13SM90_TMA_LOADENS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S2A_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2E_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:58:57.6604427Z [106/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_example_py_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_ops.cpp.o -MF experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_ops.cpp.o.d -o experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_ops.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_ops.cpp 2025-05-07T19:58:57.6622701Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:59:00.4752029Z [107/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu.o 2025-05-07T19:59:00.4815812Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:00.4819205Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:00.4821292Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:00.4880223Z ^ 2025-05-07T19:59:00.4880872Z 2025-05-07T19:59:00.4881323Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:00.4882057Z 2025-05-07T19:59:00.4883543Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:00.4885776Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:00.4886557Z ^ 2025-05-07T19:59:00.4886894Z 2025-05-07T19:59:02.3396414Z [108/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu.o 2025-05-07T19:59:02.3418978Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:02.3421896Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:02.3423920Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:02.3424798Z ^ 2025-05-07T19:59:02.3425100Z 2025-05-07T19:59:02.3425541Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:02.3426214Z 2025-05-07T19:59:02.3427661Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:02.3429732Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:02.3430512Z ^ 2025-05-07T19:59:02.3430858Z 2025-05-07T19:59:04.2929193Z [109/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/include/fast_gemv.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/include/fast_gemv.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/include/fast_gemv.cu.o 2025-05-07T19:59:04.2947279Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:37.4284511Z [110/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16_fast_gemv.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16_fast_gemv.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16_fast_gemv.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16_fast_gemv.cu.o 2025-05-07T19:59:37.4296812Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:48.5941310Z [111/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu.o 2025-05-07T19:59:48.5955081Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:48.5956663Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:48.5957832Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:48.6013650Z ^ 2025-05-07T19:59:48.6013934Z 2025-05-07T19:59:48.6014203Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:48.6014579Z 2025-05-07T19:59:48.6075077Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:48.6076725Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:48.6077580Z ^ 2025-05-07T19:59:48.6077787Z 2025-05-07T19:59:52.7426816Z [112/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu.o 2025-05-07T19:59:52.7439727Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:53.2686848Z [113/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu.o 2025-05-07T19:59:53.2701910Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:53.2703545Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:53.2704715Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:53.2705205Z ^ 2025-05-07T19:59:53.2705390Z 2025-05-07T19:59:53.2705648Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:53.2706056Z 2025-05-07T19:59:53.2706887Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:53.2750493Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:53.2751383Z ^ 2025-05-07T19:59:53.2751612Z 2025-05-07T19:59:55.6690002Z [114/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16.cu.o 2025-05-07T19:59:55.6702423Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:55.6704018Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:55.6705220Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:55.6705668Z ^ 2025-05-07T19:59:55.6705855Z 2025-05-07T19:59:55.6706112Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:55.6706469Z 2025-05-07T19:59:55.6707310Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:55.6708487Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:55.6708939Z ^ 2025-05-07T19:59:55.6709110Z 2025-05-07T19:59:56.5943332Z [115/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu.o 2025-05-07T19:59:56.5955696Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:56.5957438Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:56.5958587Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:56.5959056Z ^ 2025-05-07T19:59:56.5959232Z 2025-05-07T19:59:56.5959515Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:56.5959891Z 2025-05-07T19:59:56.5960719Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:56.5961893Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:56.5962333Z ^ 2025-05-07T19:59:56.5962524Z 2025-05-07T20:00:09.8873095Z [116/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu.o 2025-05-07T20:00:09.8885829Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:09.8887411Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:09.8888686Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:09.8889133Z ^ 2025-05-07T20:00:09.8889307Z 2025-05-07T20:00:09.8889602Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:09.8889965Z 2025-05-07T20:00:09.8890799Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:09.8891987Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:09.8892442Z ^ 2025-05-07T20:00:09.8892611Z 2025-05-07T20:00:13.3577062Z [117/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu.o 2025-05-07T20:00:13.3589444Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:14.1373698Z [118/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_example_py_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/cutlass_sgemm_nn.cu.o -MF experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/cutlass_sgemm_nn.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/cutlass_sgemm_nn.cu -o experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/cutlass_sgemm_nn.cu.o 2025-05-07T20:00:14.1386041Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:14.8359578Z [119/156] : && /github/home/miniconda/envs/build_binary/bin/c++ -fPIC -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/github/home/miniconda/envs/build_binary/lib -Wl,-rpath-link,/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs -s -shared -Wl,-soname,fbgemm_gpu_experimental_example_py.so -o experimental/example/fbgemm_gpu_experimental_example_py.so experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_nccl.cpp.o experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_ops.cpp.o experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/cutlass_sgemm_nn.cu.o -L/lib/intel64 -L/lib/intel64_win -L/lib/win-x64 -Wl,-rpath,/lib/intel64:/lib/intel64_win:/lib/win-x64:/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib:/github/home/miniconda/envs/build_binary/lib/stubs: /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libtorch.so /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib/libnccl.so.2 /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so" -Wl,--as-needed -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so" -Wl,--as-needed /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libcudart.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libtorch.so" -Wl,--as-needed -lcudadevrt -lcudart_static -ldl && : 2025-05-07T20:00:14.8747944Z [120/156] cd /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-build/experimental/example && bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/../.github/scripts/fbgemm_gpu_postbuild.bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:00:14.8749544Z ################################################################################ 2025-05-07T20:00:14.8749963Z [CMAKE] Running post-build script ... 2025-05-07T20:00:14.8750738Z Target file: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:00:14.8751515Z Removing all RPATHs ... 2025-05-07T20:00:14.8751810Z ################################################################################ 2025-05-07T20:00:17.7917785Z [121/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu.o 2025-05-07T20:00:17.7930632Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:17.7932214Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:17.7933380Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:17.7933833Z ^ 2025-05-07T20:00:17.7934006Z 2025-05-07T20:00:17.7934269Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:17.7934622Z 2025-05-07T20:00:17.7935449Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:17.7936626Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:17.7937079Z ^ 2025-05-07T20:00:17.7937367Z 2025-05-07T20:00:17.7938165Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:17.7939317Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:17.7939761Z ^ 2025-05-07T20:00:17.7940037Z detected during: 2025-05-07T20:00:17.7955165Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:17.7983920Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:17.8012717Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:17.8029193Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:00:17.8030428Z 2025-05-07T20:00:17.8030672Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:17.8031051Z 2025-05-07T20:00:17.8031844Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:17.8032968Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:17.8033373Z ^ 2025-05-07T20:00:17.8033624Z detected during: 2025-05-07T20:00:17.8047712Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:17.8076773Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:17.8107445Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:17.8136480Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:17.8153049Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:00:17.8154212Z 2025-05-07T20:00:17.8155109Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:17.8156271Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:17.8156767Z ^ 2025-05-07T20:00:17.8157055Z detected during: 2025-05-07T20:00:17.8172337Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:17.8200804Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:17.8229712Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:17.8246218Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:00:17.8247366Z 2025-05-07T20:00:17.8247630Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:17.8247991Z 2025-05-07T20:00:17.8248795Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:17.8249928Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:17.8250328Z ^ 2025-05-07T20:00:17.8250584Z detected during: 2025-05-07T20:00:17.8264633Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:17.8293625Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:17.8322160Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:17.8351109Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:17.8367750Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:00:17.8368901Z 2025-05-07T20:00:17.8369702Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:17.8370875Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:17.8371340Z ^ 2025-05-07T20:00:17.8371606Z detected during: 2025-05-07T20:00:17.8386585Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:17.8415074Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:17.8445315Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:17.8461726Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:00:17.8462882Z 2025-05-07T20:00:17.8463141Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:17.8463536Z 2025-05-07T20:00:17.8464344Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:17.8465494Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:17.8465914Z ^ 2025-05-07T20:00:17.8466191Z detected during: 2025-05-07T20:00:17.8480430Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:17.8509273Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:17.8537808Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:17.8566968Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:17.8583541Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:00:17.8584807Z 2025-05-07T20:00:17.8586067Z ptxas /tmp/tmpxft_00008c89_00000000-9_f4f4bf16_128_128_4_1_1_t.compute_90.ptx, line 925; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:17.8588663Z ptxas /tmp/tmpxft_00008c89_00000000-9_f4f4bf16_128_128_4_1_1_t.compute_90.ptx, line 937; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:17.8591267Z ptxas /tmp/tmpxft_00008c89_00000000-9_f4f4bf16_128_128_4_1_1_t.compute_90.ptx, line 1076; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:17.8593849Z ptxas /tmp/tmpxft_00008c89_00000000-9_f4f4bf16_128_128_4_1_1_t.compute_90.ptx, line 1088; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:17.8596011Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:17.8597187Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:17.8597642Z ^ 2025-05-07T20:00:17.8597944Z detected during: 2025-05-07T20:00:17.8612862Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:17.8641550Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:17.8670774Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:17.8687324Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:00:17.8688521Z 2025-05-07T20:00:17.8688763Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:17.8689200Z 2025-05-07T20:00:17.8690003Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:17.8691124Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:17.8691517Z ^ 2025-05-07T20:00:17.8691746Z detected during: 2025-05-07T20:00:17.8705809Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:17.8734403Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:17.8764000Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:17.8793089Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:17.8809466Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:00:17.8810619Z 2025-05-07T20:00:17.8811411Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:17.8812553Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:17.8813043Z ^ 2025-05-07T20:00:17.8813309Z detected during: 2025-05-07T20:00:17.8828219Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:17.8856730Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:17.8885612Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:17.8902021Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:00:17.8903168Z 2025-05-07T20:00:17.8903416Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:17.8903770Z 2025-05-07T20:00:17.8904580Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:17.8905686Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:17.8906089Z ^ 2025-05-07T20:00:17.8906301Z detected during: 2025-05-07T20:00:17.8920179Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:17.8948753Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:17.8977511Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:17.9006399Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:17.9022793Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:00:17.9023948Z 2025-05-07T20:00:17.9024761Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:17.9025901Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:17.9026353Z ^ 2025-05-07T20:00:17.9026604Z detected during: 2025-05-07T20:00:17.9041783Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:17.9070408Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:17.9100584Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:17.9116966Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:00:17.9118120Z 2025-05-07T20:00:17.9118376Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:17.9118730Z 2025-05-07T20:00:17.9119534Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:17.9120645Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:17.9121036Z ^ 2025-05-07T20:00:17.9121262Z detected during: 2025-05-07T20:00:17.9135174Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:17.9163923Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:17.9192470Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:17.9221243Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:17.9237513Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:00:17.9238658Z 2025-05-07T20:00:19.3848816Z [122/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu.o 2025-05-07T20:00:19.3862636Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:19.3864272Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:19.3865442Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:19.3865952Z ^ 2025-05-07T20:00:19.3866136Z 2025-05-07T20:00:19.3866423Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:19.3866792Z 2025-05-07T20:00:19.3867817Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:19.3869046Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:19.3869510Z ^ 2025-05-07T20:00:19.3869721Z 2025-05-07T20:00:19.3870537Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:19.3871713Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:19.3872301Z ^ 2025-05-07T20:00:19.3872606Z detected during: 2025-05-07T20:00:19.3887651Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:19.3916507Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:19.3945487Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:19.3961912Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:00:19.3963110Z 2025-05-07T20:00:19.3963369Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:19.3963770Z 2025-05-07T20:00:19.3964580Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:19.3965747Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:19.3966176Z ^ 2025-05-07T20:00:19.3966449Z detected during: 2025-05-07T20:00:19.3980777Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:19.4009769Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:19.4038501Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:19.4067735Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:19.4084155Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:00:19.4085341Z 2025-05-07T20:00:19.4086154Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:19.4087358Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:19.4087817Z ^ 2025-05-07T20:00:19.4088129Z detected during: 2025-05-07T20:00:19.4103196Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:19.4131688Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:19.4160745Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:19.4177455Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:00:19.4178654Z 2025-05-07T20:00:19.4178913Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:19.4179290Z 2025-05-07T20:00:19.4180139Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:19.4181275Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:19.4181735Z ^ 2025-05-07T20:00:19.4181973Z detected during: 2025-05-07T20:00:19.4196031Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:19.4224915Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:19.4253444Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:19.4282690Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:19.4299163Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:00:19.4300344Z 2025-05-07T20:00:19.4301180Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:19.4302355Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:19.4302848Z ^ 2025-05-07T20:00:19.4303137Z detected during: 2025-05-07T20:00:19.4318002Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:19.4346766Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:19.4375971Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:19.4392402Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:00:19.4393580Z 2025-05-07T20:00:19.4393837Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:19.4394201Z 2025-05-07T20:00:19.4395038Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:19.4396220Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:19.4396652Z ^ 2025-05-07T20:00:19.4396885Z detected during: 2025-05-07T20:00:19.4410862Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:19.4439721Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:19.4468538Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:19.4497724Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:19.4514028Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:00:19.4515211Z 2025-05-07T20:00:19.4516456Z ptxas /tmp/tmpxft_00008c88_00000000-9_f4f4bf16_128_128_4_1_1_f.compute_90.ptx, line 925; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:19.4519114Z ptxas /tmp/tmpxft_00008c88_00000000-9_f4f4bf16_128_128_4_1_1_f.compute_90.ptx, line 937; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:19.4521668Z ptxas /tmp/tmpxft_00008c88_00000000-9_f4f4bf16_128_128_4_1_1_f.compute_90.ptx, line 1076; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:19.4524256Z ptxas /tmp/tmpxft_00008c88_00000000-9_f4f4bf16_128_128_4_1_1_f.compute_90.ptx, line 1088; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:19.4526418Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:19.4527593Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:19.4528048Z ^ 2025-05-07T20:00:19.4528340Z detected during: 2025-05-07T20:00:19.4543512Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:19.4572202Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:19.4601112Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:19.4617540Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:00:19.4618727Z 2025-05-07T20:00:19.4618979Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:19.4619347Z 2025-05-07T20:00:19.4620175Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:19.4621288Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:19.4621722Z ^ 2025-05-07T20:00:19.4621955Z detected during: 2025-05-07T20:00:19.4636041Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:19.4665717Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:19.4694878Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:19.4723771Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:19.4740250Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:00:19.4741484Z 2025-05-07T20:00:19.4742312Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:19.4743461Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:19.4743942Z ^ 2025-05-07T20:00:19.4744215Z detected during: 2025-05-07T20:00:19.4759092Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:19.4787760Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:19.4816708Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:19.4833146Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:00:19.4834371Z 2025-05-07T20:00:19.4834654Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:19.4835022Z 2025-05-07T20:00:19.4835825Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:19.4836977Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:19.4837392Z ^ 2025-05-07T20:00:19.4837658Z detected during: 2025-05-07T20:00:19.4851560Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:19.4880574Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:19.4909079Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:19.4938000Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:19.4954323Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:00:19.4955481Z 2025-05-07T20:00:19.4956284Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:19.4957459Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:19.4957925Z ^ 2025-05-07T20:00:19.4958191Z detected during: 2025-05-07T20:00:19.4973407Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:19.5001944Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:19.5030834Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:19.5047233Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:00:19.5048420Z 2025-05-07T20:00:19.5048669Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:19.5049053Z 2025-05-07T20:00:19.5049865Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:19.5050984Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:19.5051419Z ^ 2025-05-07T20:00:19.5051751Z detected during: 2025-05-07T20:00:19.5065832Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:19.5095175Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:19.5123674Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:19.5152993Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:19.5169573Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:00:19.5170750Z 2025-05-07T20:00:27.4272085Z [123/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu.o 2025-05-07T20:00:27.4284501Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:27.4286075Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:27.4287200Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:27.4287654Z ^ 2025-05-07T20:00:27.4287826Z 2025-05-07T20:00:27.4288068Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:27.4288430Z 2025-05-07T20:00:27.4289245Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:27.4290397Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:27.4290851Z ^ 2025-05-07T20:00:27.4291040Z 2025-05-07T20:00:33.2137804Z [124/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/mixed_dtype_utils.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/mixed_dtype_utils.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/mixed_dtype_utils.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/mixed_dtype_utils.cu.o 2025-05-07T20:00:33.2150224Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:33.2151981Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:33.2153327Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:33.2153881Z ^ 2025-05-07T20:00:33.2154141Z 2025-05-07T20:00:33.2154390Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:33.2154756Z 2025-05-07T20:00:33.2155717Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:33.2157021Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:33.2157546Z ^ 2025-05-07T20:00:33.2157789Z 2025-05-07T20:00:33.2158760Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:33.2160064Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:33.2160581Z ^ 2025-05-07T20:00:33.2160806Z 2025-05-07T20:00:33.2161076Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:33.2161436Z 2025-05-07T20:00:33.2162373Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:33.2163800Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:33.2164297Z ^ 2025-05-07T20:00:33.2164560Z 2025-05-07T20:00:33.2165505Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:33.2166835Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:33.2167532Z ^ 2025-05-07T20:00:33.2167791Z 2025-05-07T20:00:33.2168141Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:33.2168499Z 2025-05-07T20:00:33.2169459Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:33.2170750Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:33.2171268Z ^ 2025-05-07T20:00:33.2171635Z 2025-05-07T20:00:33.2172606Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:33.2173918Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:33.2174423Z ^ 2025-05-07T20:00:33.2174648Z 2025-05-07T20:00:33.2174894Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:33.2175277Z 2025-05-07T20:00:33.2176223Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:33.2177640Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:33.2178145Z ^ 2025-05-07T20:00:33.2178413Z 2025-05-07T20:00:33.2179361Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:33.2180694Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:33.2181178Z ^ 2025-05-07T20:00:33.2181429Z 2025-05-07T20:00:33.2181675Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:33.2182034Z 2025-05-07T20:00:33.2182991Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:33.2184293Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:33.2184804Z ^ 2025-05-07T20:00:33.2185041Z 2025-05-07T20:00:33.2185974Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:33.2188181Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:33.2188673Z ^ 2025-05-07T20:00:33.2188914Z 2025-05-07T20:00:33.2189157Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:33.2189513Z 2025-05-07T20:00:33.2190459Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:33.2191768Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:33.2192271Z ^ 2025-05-07T20:00:33.2192507Z 2025-05-07T20:00:33.2193451Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:33.2194769Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:33.2195255Z ^ 2025-05-07T20:00:33.2195546Z 2025-05-07T20:00:33.2195789Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:33.2196166Z 2025-05-07T20:00:33.2197098Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:33.2198409Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:33.2198900Z ^ 2025-05-07T20:00:33.2199164Z 2025-05-07T20:00:51.1347861Z [125/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu.o 2025-05-07T20:00:51.1369820Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:51.1372606Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:51.1374556Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:51.1375284Z ^ 2025-05-07T20:00:51.1375625Z 2025-05-07T20:00:51.1376029Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:51.1377006Z 2025-05-07T20:00:51.1378614Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:51.1380570Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:51.1381302Z ^ 2025-05-07T20:00:51.1381602Z 2025-05-07T20:00:51.1383252Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:51.1385514Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:51.1386281Z ^ 2025-05-07T20:00:51.1386666Z 2025-05-07T20:00:51.1388294Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:51.1390499Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:51.1391374Z ^ 2025-05-07T20:00:51.1391767Z 2025-05-07T20:00:51.1393375Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:51.1395561Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:51.1396411Z ^ 2025-05-07T20:00:51.1396805Z 2025-05-07T20:00:51.1397245Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:51.1397849Z 2025-05-07T20:00:51.1399420Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:51.1401626Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:51.1402708Z ^ 2025-05-07T20:00:51.1403137Z 2025-05-07T20:00:51.1404729Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:51.1406960Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:51.1407698Z ^ 2025-05-07T20:00:51.1408075Z 2025-05-07T20:00:51.1408484Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:51.1409068Z 2025-05-07T20:00:51.1410648Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:51.1412852Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:51.1413731Z ^ 2025-05-07T20:00:51.1414133Z 2025-05-07T20:00:51.1415747Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:51.1418197Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:51.1419035Z ^ 2025-05-07T20:00:51.1419406Z 2025-05-07T20:00:51.1419812Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:51.1420460Z 2025-05-07T20:00:51.1422040Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:51.1424267Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:51.1425072Z ^ 2025-05-07T20:00:51.1425514Z 2025-05-07T20:00:51.1427109Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:51.1429370Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:51.1430151Z ^ 2025-05-07T20:00:51.1430557Z 2025-05-07T20:00:51.1430976Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:51.1431582Z 2025-05-07T20:00:51.1433155Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:51.1435247Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:51.1436077Z ^ 2025-05-07T20:00:51.1436478Z 2025-05-07T20:00:51.1438031Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:51.1440186Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:51.1441175Z ^ 2025-05-07T20:00:51.1441544Z 2025-05-07T20:00:51.1441952Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:51.1442575Z 2025-05-07T20:00:51.1444133Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:51.1446287Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:51.1447067Z ^ 2025-05-07T20:00:51.1447508Z 2025-05-07T20:00:51.1449081Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:51.1451292Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:51.1452090Z ^ 2025-05-07T20:00:51.1452469Z 2025-05-07T20:00:51.1452902Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:51.1453679Z 2025-05-07T20:00:51.1455228Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:51.1457539Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:51.1458405Z ^ 2025-05-07T20:00:51.1458817Z 2025-05-07T20:01:15.3305854Z [126/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu.o 2025-05-07T20:01:15.3326275Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:15.3339326Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:15.3341472Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:15.3342204Z ^ 2025-05-07T20:01:15.3342465Z 2025-05-07T20:01:15.3342858Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:15.3343436Z 2025-05-07T20:01:15.3344760Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:15.3347016Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:15.3347714Z ^ 2025-05-07T20:01:15.3348000Z 2025-05-07T20:01:30.8599464Z [127/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu.o 2025-05-07T20:01:30.8620444Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:30.8623017Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:30.8624861Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:30.8625599Z ^ 2025-05-07T20:01:30.8625894Z 2025-05-07T20:01:30.8626299Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:30.8626863Z 2025-05-07T20:01:30.8628211Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:30.8630472Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:30.8631202Z ^ 2025-05-07T20:01:30.8631471Z 2025-05-07T20:01:30.8632782Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:30.8634690Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:30.8635417Z ^ 2025-05-07T20:01:30.8635829Z detected during: 2025-05-07T20:01:30.8660174Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:30.8711323Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:30.8762743Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:30.8792742Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:30.8794908Z 2025-05-07T20:01:30.8795340Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:30.8795998Z 2025-05-07T20:01:30.8797768Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:30.8799768Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:30.8800494Z ^ 2025-05-07T20:01:30.8800904Z detected during: 2025-05-07T20:01:30.8826535Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:30.8874444Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:30.8923461Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:30.8975983Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:30.9005879Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:30.9007910Z 2025-05-07T20:01:30.9009323Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:30.9011393Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:30.9012161Z ^ 2025-05-07T20:01:30.9012623Z detected during: 2025-05-07T20:01:30.9036652Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:30.9086250Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:30.9139043Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:30.9169497Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:30.9171536Z 2025-05-07T20:01:30.9171967Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:30.9172585Z 2025-05-07T20:01:30.9174038Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:30.9175964Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:30.9176661Z ^ 2025-05-07T20:01:30.9177013Z detected during: 2025-05-07T20:01:30.9200061Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:30.9247557Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:30.9299634Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:30.9347018Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:30.9374271Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:30.9376219Z 2025-05-07T20:01:30.9377640Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:30.9379526Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:30.9380225Z ^ 2025-05-07T20:01:30.9380620Z detected during: 2025-05-07T20:01:30.9405325Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:30.9451589Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:30.9502201Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:30.9531305Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:30.9533341Z 2025-05-07T20:01:30.9533760Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:30.9534401Z 2025-05-07T20:01:30.9535825Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:30.9537921Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:30.9538604Z ^ 2025-05-07T20:01:30.9538979Z detected during: 2025-05-07T20:01:30.9564871Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:30.9617385Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:30.9663189Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:30.9714974Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:30.9744932Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:30.9747018Z 2025-05-07T20:01:30.9748475Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:30.9750577Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:30.9751361Z ^ 2025-05-07T20:01:30.9751802Z detected during: 2025-05-07T20:01:30.9779744Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:30.9828834Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:30.9879820Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:30.9909459Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:30.9911537Z 2025-05-07T20:01:30.9911974Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:30.9912615Z 2025-05-07T20:01:30.9914289Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:30.9916413Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:30.9917143Z ^ 2025-05-07T20:01:30.9917487Z detected during: 2025-05-07T20:01:30.9944005Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:30.9994152Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:31.0040119Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:31.0091735Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:31.0118389Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:31.0120291Z 2025-05-07T20:01:31.0121581Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:31.0123387Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:31.0124092Z ^ 2025-05-07T20:01:31.0124467Z detected during: 2025-05-07T20:01:31.0148752Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:31.0185306Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:31.0213711Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:31.0229935Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:31.0231159Z 2025-05-07T20:01:31.0231417Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:31.0231776Z 2025-05-07T20:01:31.0232583Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:31.0233707Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:31.0234108Z ^ 2025-05-07T20:01:31.0234347Z detected during: 2025-05-07T20:01:31.0248277Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:31.0277174Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:31.0306627Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:31.0335221Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:31.0351472Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:31.0352619Z 2025-05-07T20:01:31.0353489Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:31.0354638Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:31.0355085Z ^ 2025-05-07T20:01:31.0355340Z detected during: 2025-05-07T20:01:31.0370269Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:31.0398524Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:31.0427179Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:31.0443362Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:31.0444497Z 2025-05-07T20:01:31.0444741Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:31.0445106Z 2025-05-07T20:01:31.0445906Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:31.0447016Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:31.0447407Z ^ 2025-05-07T20:01:31.0447634Z detected during: 2025-05-07T20:01:31.0461669Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:31.0490482Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:31.0518447Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:31.0546938Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:31.0563171Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:31.0564326Z 2025-05-07T20:01:36.6193980Z [128/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu.o 2025-05-07T20:01:36.6209974Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:36.6211583Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:36.6212765Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:36.6213212Z ^ 2025-05-07T20:01:36.6213389Z 2025-05-07T20:01:36.6213660Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:36.6214026Z 2025-05-07T20:01:36.6214990Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:36.6216195Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:36.6216674Z ^ 2025-05-07T20:01:36.6216852Z 2025-05-07T20:01:36.6217773Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:36.6218955Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:36.6219407Z ^ 2025-05-07T20:01:36.6219701Z detected during: 2025-05-07T20:01:36.6234759Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:36.6264276Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:36.6293625Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:36.6310119Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:36.6311283Z 2025-05-07T20:01:36.6311525Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:36.6311899Z 2025-05-07T20:01:36.6312695Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:36.6379537Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:36.6379966Z ^ 2025-05-07T20:01:36.6380205Z detected during: 2025-05-07T20:01:36.6394593Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:36.6423791Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:36.6452390Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:36.6481665Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:36.6498272Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:36.6499424Z 2025-05-07T20:01:36.6500235Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:36.6501408Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:36.6501853Z ^ 2025-05-07T20:01:36.6502109Z detected during: 2025-05-07T20:01:36.6517208Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:36.6545745Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:36.6574902Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:36.6591485Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:36.6592625Z 2025-05-07T20:01:36.6592895Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:36.6593260Z 2025-05-07T20:01:36.6594123Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:36.6595267Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:36.6595675Z ^ 2025-05-07T20:01:36.6595929Z detected during: 2025-05-07T20:01:36.6609975Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:36.6639036Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:36.6669104Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:36.6698290Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:36.6714848Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:36.6715995Z 2025-05-07T20:01:36.6716790Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:36.6717938Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:36.6718381Z ^ 2025-05-07T20:01:36.6718631Z detected during: 2025-05-07T20:01:36.6733796Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:36.6762309Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:36.6791577Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:36.6808006Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:36.6809149Z 2025-05-07T20:01:36.6809393Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:36.6809761Z 2025-05-07T20:01:36.6810564Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:36.6811673Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:36.6812068Z ^ 2025-05-07T20:01:36.6812296Z detected during: 2025-05-07T20:01:36.6826411Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:36.6855325Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:36.6884353Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:36.6913604Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:36.6930027Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:36.6931193Z 2025-05-07T20:01:36.6932009Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:36.6933143Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:36.6933592Z ^ 2025-05-07T20:01:36.6933844Z detected during: 2025-05-07T20:01:36.6948928Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:36.6978805Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:36.7007972Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:36.7024457Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:36.7025639Z 2025-05-07T20:01:36.7025894Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:36.7026247Z 2025-05-07T20:01:36.7027047Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:36.7028161Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:36.7028555Z ^ 2025-05-07T20:01:36.7028788Z detected during: 2025-05-07T20:01:36.7042960Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:36.7072225Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:36.7100786Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:36.7129867Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:36.7146540Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:36.7147690Z 2025-05-07T20:01:36.7148479Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:36.7149621Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:36.7150086Z ^ 2025-05-07T20:01:36.7150366Z detected during: 2025-05-07T20:01:36.7165422Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:36.7194170Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:36.7223193Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:36.7239690Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:36.7240829Z 2025-05-07T20:01:36.7241068Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:36.7241434Z 2025-05-07T20:01:36.7242233Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:36.7243340Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:36.7243803Z ^ 2025-05-07T20:01:36.7244029Z detected during: 2025-05-07T20:01:36.7257999Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:36.7287100Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:36.7316836Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:36.7346120Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:36.7362541Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:36.7363693Z 2025-05-07T20:01:36.7364483Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:36.7365656Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:36.7366089Z ^ 2025-05-07T20:01:36.7366350Z detected during: 2025-05-07T20:01:36.7381559Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:36.7410005Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:36.7439049Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:36.7455486Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:36.7456641Z 2025-05-07T20:01:36.7456880Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:36.7457302Z 2025-05-07T20:01:36.7458216Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:36.7459312Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:36.7459722Z ^ 2025-05-07T20:01:36.7459942Z detected during: 2025-05-07T20:01:36.7474135Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:36.7503218Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:36.7532015Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:36.7561032Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:36.7577851Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:36.7579010Z 2025-05-07T20:01:39.6505618Z [129/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu.o 2025-05-07T20:01:39.6518526Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:39.6520091Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.6521320Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.6521754Z ^ 2025-05-07T20:01:39.6521938Z 2025-05-07T20:01:39.6522181Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:39.6522561Z 2025-05-07T20:01:39.6523432Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.6524610Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:39.6525089Z ^ 2025-05-07T20:01:39.6525266Z 2025-05-07T20:01:39.8197191Z [130/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu.o 2025-05-07T20:01:39.8210007Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:39.8211599Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.8212756Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.8213328Z ^ 2025-05-07T20:01:39.8213501Z 2025-05-07T20:01:39.8213752Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:39.8214140Z 2025-05-07T20:01:39.8214971Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.8216163Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:39.8216606Z ^ 2025-05-07T20:01:39.8216804Z 2025-05-07T20:01:39.8217698Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.8218871Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.8219318Z ^ 2025-05-07T20:01:39.8219610Z detected during: 2025-05-07T20:01:39.8234722Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.8263442Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.8293297Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.8309925Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:39.8311118Z 2025-05-07T20:01:39.8311373Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:39.8311740Z 2025-05-07T20:01:39.8312573Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.8313697Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.8314126Z ^ 2025-05-07T20:01:39.8314361Z detected during: 2025-05-07T20:01:39.8328486Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:39.8357486Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.8386388Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.8415570Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.8432079Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:39.8433251Z 2025-05-07T20:01:39.8434076Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.8435286Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.8435758Z ^ 2025-05-07T20:01:39.8436025Z detected during: 2025-05-07T20:01:39.8451159Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.8480187Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.8509273Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.8525820Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:39.8527006Z 2025-05-07T20:01:39.8527283Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:39.8527646Z 2025-05-07T20:01:39.8528448Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.8529587Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.8529995Z ^ 2025-05-07T20:01:39.8530257Z detected during: 2025-05-07T20:01:39.8544364Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:39.8573569Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.8602537Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.8631707Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.8648272Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:39.8649430Z 2025-05-07T20:01:39.8650322Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.8651503Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.8651979Z ^ 2025-05-07T20:01:39.8652253Z detected during: 2025-05-07T20:01:39.8667626Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.8696502Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.8725655Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.8742242Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:39.8743418Z 2025-05-07T20:01:39.8743664Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:39.8744049Z 2025-05-07T20:01:39.8744852Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.8745965Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.8746398Z ^ 2025-05-07T20:01:39.8746660Z detected during: 2025-05-07T20:01:39.8760734Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:39.8790056Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.8818732Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.8848304Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.8865106Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:39.8866290Z 2025-05-07T20:01:39.8867106Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.8868494Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.8869047Z ^ 2025-05-07T20:01:39.8869349Z detected during: 2025-05-07T20:01:39.8884694Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.8913371Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.8942560Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.8959166Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:39.8960384Z 2025-05-07T20:01:39.8960638Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:39.8961003Z 2025-05-07T20:01:39.8961819Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.8962938Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.8963359Z ^ 2025-05-07T20:01:39.8963593Z detected during: 2025-05-07T20:01:39.8977957Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:39.9006982Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.9035648Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.9064789Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.9081686Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:39.9082845Z 2025-05-07T20:01:39.9083674Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.9084828Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.9085305Z ^ 2025-05-07T20:01:39.9085580Z detected during: 2025-05-07T20:01:39.9100947Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.9129609Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.9159040Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.9175695Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:39.9176845Z 2025-05-07T20:01:39.9177159Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:39.9177523Z 2025-05-07T20:01:39.9178332Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.9179463Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.9179871Z ^ 2025-05-07T20:01:39.9180122Z detected during: 2025-05-07T20:01:39.9194335Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:39.9223324Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.9252063Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.9281267Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.9297774Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:39.9298930Z 2025-05-07T20:01:39.9299739Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.9301034Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.9301518Z ^ 2025-05-07T20:01:39.9301796Z detected during: 2025-05-07T20:01:39.9317024Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.9345737Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.9374954Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.9391697Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:39.9392916Z 2025-05-07T20:01:39.9393165Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:39.9393554Z 2025-05-07T20:01:39.9394352Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.9395490Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.9395900Z ^ 2025-05-07T20:01:39.9396148Z detected during: 2025-05-07T20:01:39.9410237Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:39.9439173Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.9467966Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.9496953Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.9513603Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:39.9514783Z 2025-05-07T20:01:40.4509600Z [131/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu.o 2025-05-07T20:01:40.4523608Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:40.4525196Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.4526429Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.4526907Z ^ 2025-05-07T20:01:40.4527090Z 2025-05-07T20:01:40.4527343Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:40.4527711Z 2025-05-07T20:01:40.4528568Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.4529732Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:40.4530203Z ^ 2025-05-07T20:01:40.4530383Z 2025-05-07T20:01:40.4531300Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.4532453Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.4532936Z ^ 2025-05-07T20:01:40.4533213Z detected during: 2025-05-07T20:01:40.4548356Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.4577349Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.4606537Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.4623280Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:01:40.4624443Z 2025-05-07T20:01:40.4624717Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:40.4625081Z 2025-05-07T20:01:40.4680158Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.4681541Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.4681941Z ^ 2025-05-07T20:01:40.4682181Z detected during: 2025-05-07T20:01:40.4696542Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:40.4725791Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.4754407Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.4783850Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.4800257Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:01:40.4801416Z 2025-05-07T20:01:40.4802227Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.4803371Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.4803807Z ^ 2025-05-07T20:01:40.4804074Z detected during: 2025-05-07T20:01:40.4819175Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.4847885Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.4877537Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.4893997Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:01:40.4895204Z 2025-05-07T20:01:40.4895446Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:40.4895816Z 2025-05-07T20:01:40.4896612Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.4897762Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.4898170Z ^ 2025-05-07T20:01:40.4898402Z detected during: 2025-05-07T20:01:40.4912542Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:40.4941564Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.4970381Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.4999541Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.5016087Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:01:40.5017290Z 2025-05-07T20:01:40.5018085Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.5019226Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.5019663Z ^ 2025-05-07T20:01:40.5019927Z detected during: 2025-05-07T20:01:40.5034977Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.5063454Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.5092656Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.5109169Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:01:40.5110345Z 2025-05-07T20:01:40.5110590Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:40.5110947Z 2025-05-07T20:01:40.5111756Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.5112856Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.5113263Z ^ 2025-05-07T20:01:40.5113480Z detected during: 2025-05-07T20:01:40.5127683Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:40.5157002Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.5185960Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.5215199Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.5232353Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:01:40.5233511Z 2025-05-07T20:01:40.5234329Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.5235508Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.5235957Z ^ 2025-05-07T20:01:40.5236215Z detected during: 2025-05-07T20:01:40.5251184Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.5280165Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.5309295Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.5325757Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:01:40.5326969Z 2025-05-07T20:01:40.5327224Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:40.5327584Z 2025-05-07T20:01:40.5328415Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.5329538Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.5329974Z ^ 2025-05-07T20:01:40.5330213Z detected during: 2025-05-07T20:01:40.5344455Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:40.5373618Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.5402371Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.5431466Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.5448116Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:01:40.5449268Z 2025-05-07T20:01:40.5450160Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.5451308Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.5451784Z ^ 2025-05-07T20:01:40.5452059Z detected during: 2025-05-07T20:01:40.5467144Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.5495980Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.5525041Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.5541630Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:01:40.5542789Z 2025-05-07T20:01:40.5543057Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:40.5543417Z 2025-05-07T20:01:40.5544231Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.5545371Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.5545779Z ^ 2025-05-07T20:01:40.5546033Z detected during: 2025-05-07T20:01:40.5560177Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:40.5589369Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.5618120Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.5647692Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.5664239Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:01:40.5665399Z 2025-05-07T20:01:40.5666205Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.5667543Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.5668020Z ^ 2025-05-07T20:01:40.5668384Z detected during: 2025-05-07T20:01:40.5683509Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.5712142Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.5741417Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.5757951Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:01:40.5759157Z 2025-05-07T20:01:40.5759407Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:40.5759795Z 2025-05-07T20:01:40.5760604Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.5761747Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.5762158Z ^ 2025-05-07T20:01:40.5762416Z detected during: 2025-05-07T20:01:40.5776763Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:40.5805815Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.5834470Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.5863943Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.5881557Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:01:40.5882739Z 2025-05-07T20:01:44.6547025Z [132/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu.o 2025-05-07T20:01:44.6559668Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:44.6561289Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:44.6562461Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:44.6562939Z ^ 2025-05-07T20:01:44.6563117Z 2025-05-07T20:01:44.6563374Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:44.6563837Z 2025-05-07T20:01:44.6564666Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:44.6565857Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:44.6566310Z ^ 2025-05-07T20:01:44.6566510Z 2025-05-07T20:01:44.6567501Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:44.6568680Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:44.6569131Z ^ 2025-05-07T20:01:44.6569420Z detected during: 2025-05-07T20:01:44.6584712Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:44.6613970Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:44.6643064Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:44.6659845Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:44.6661018Z 2025-05-07T20:01:44.6661267Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:44.6661654Z 2025-05-07T20:01:44.6662460Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:44.6663584Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:44.6663990Z ^ 2025-05-07T20:01:44.6664256Z detected during: 2025-05-07T20:01:44.6678536Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:44.6707472Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:44.6736083Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:44.6765263Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:44.6782013Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:44.6783173Z 2025-05-07T20:01:44.6783985Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:44.6785215Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:44.6785664Z ^ 2025-05-07T20:01:44.6785965Z detected during: 2025-05-07T20:01:44.6800985Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:44.6829687Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:44.6858806Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:44.6876882Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:44.6878052Z 2025-05-07T20:01:44.6878369Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:44.6878731Z 2025-05-07T20:01:44.6879554Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:44.6880666Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:44.6881103Z ^ 2025-05-07T20:01:44.6881332Z detected during: 2025-05-07T20:01:44.6895429Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:44.6924782Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:44.6953578Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:44.6983035Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:44.6999685Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:44.7000817Z 2025-05-07T20:01:44.7001678Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:44.7002809Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:44.7003255Z ^ 2025-05-07T20:01:44.7003509Z detected during: 2025-05-07T20:01:44.7018565Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:44.7047048Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:44.7076323Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:44.7092846Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:44.7093989Z 2025-05-07T20:01:44.7094248Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:44.7094602Z 2025-05-07T20:01:44.7095400Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:44.7096557Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:44.7096974Z ^ 2025-05-07T20:01:44.7097271Z detected during: 2025-05-07T20:01:44.7111349Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:44.7140523Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:44.7169350Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:44.7198629Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:44.7215160Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:44.7216294Z 2025-05-07T20:01:44.7217145Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:44.7218295Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:44.7218739Z ^ 2025-05-07T20:01:44.7218991Z detected during: 2025-05-07T20:01:44.7234037Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:44.7263052Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:44.7292370Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:44.7308955Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:44.7310146Z 2025-05-07T20:01:44.7310399Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:44.7310784Z 2025-05-07T20:01:44.7311588Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:44.7312723Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:44.7313134Z ^ 2025-05-07T20:01:44.7313392Z detected during: 2025-05-07T20:01:44.7327517Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:44.7356703Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:44.7385403Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:44.7414401Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:44.7430856Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:44.7431997Z 2025-05-07T20:01:44.7432792Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:44.7433937Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:44.7434368Z ^ 2025-05-07T20:01:44.7434631Z detected during: 2025-05-07T20:01:44.7449586Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:44.7478297Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:44.7507469Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:44.7523941Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:44.7525104Z 2025-05-07T20:01:44.7525348Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:44.7525703Z 2025-05-07T20:01:44.7526518Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:44.7527617Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:44.7528050Z ^ 2025-05-07T20:01:44.7528270Z detected during: 2025-05-07T20:01:44.7542447Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:44.7571555Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:44.7600770Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:44.7630019Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:44.7646547Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:44.7647704Z 2025-05-07T20:01:44.7648540Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:44.7649721Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:44.7650185Z ^ 2025-05-07T20:01:44.7650455Z detected during: 2025-05-07T20:01:44.7665625Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:44.7696015Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:44.7725109Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:44.7741731Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:44.7742910Z 2025-05-07T20:01:44.7743190Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:44.7743560Z 2025-05-07T20:01:44.7744364Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:44.7745487Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:44.7745893Z ^ 2025-05-07T20:01:44.7746150Z detected during: 2025-05-07T20:01:44.7760139Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:44.7789362Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:44.7817975Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:44.7846943Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:44.7863507Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:44.7864647Z 2025-05-07T20:01:44.7876383Z [133/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu.o 2025-05-07T20:01:44.7888745Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:44.7890364Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:44.7891511Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:44.7892026Z ^ 2025-05-07T20:01:44.7892202Z 2025-05-07T20:01:44.7892474Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:44.7892838Z 2025-05-07T20:01:44.7893672Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:44.7894853Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:44.7895304Z ^ 2025-05-07T20:01:44.7895501Z 2025-05-07T20:01:44.7896365Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:44.7897587Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:44.7898050Z ^ 2025-05-07T20:01:44.7898343Z detected during: 2025-05-07T20:01:44.7913195Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:44.7941386Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:44.7970211Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:44.7986469Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:44.7987662Z 2025-05-07T20:01:44.7987910Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:44.7988276Z 2025-05-07T20:01:44.7989108Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:44.7990222Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:44.7990651Z ^ 2025-05-07T20:01:44.7990880Z detected during: 2025-05-07T20:01:44.8004943Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:44.8033538Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:44.8061596Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:44.8090430Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:44.8106734Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:44.8107893Z 2025-05-07T20:01:44.8108719Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:44.8109873Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:44.8110346Z ^ 2025-05-07T20:01:44.8110616Z detected during: 2025-05-07T20:01:44.8125509Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:44.8153665Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:44.8182505Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:44.8198756Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:44.8199909Z 2025-05-07T20:01:44.8200186Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:44.8200552Z 2025-05-07T20:01:44.8201358Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:44.8202505Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:44.8202918Z ^ 2025-05-07T20:01:44.8203210Z detected during: 2025-05-07T20:01:44.8217388Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:44.8246092Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:44.8274374Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:44.8303223Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:44.8319675Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:44.8320827Z 2025-05-07T20:01:44.8321631Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:44.8322802Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:44.8323269Z ^ 2025-05-07T20:01:44.8323535Z detected during: 2025-05-07T20:01:44.8338428Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:44.8366511Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:44.8395461Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:44.8411713Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:44.8412909Z 2025-05-07T20:01:44.8413164Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:44.8413556Z 2025-05-07T20:01:44.8414362Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:44.8415497Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:44.8415904Z ^ 2025-05-07T20:01:44.8416160Z detected during: 2025-05-07T20:01:44.8430306Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:44.8459233Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:44.8487529Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:44.8516214Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:44.8532640Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:44.8533846Z 2025-05-07T20:01:44.8534647Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:44.8535810Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:44.8536258Z ^ 2025-05-07T20:01:44.8536540Z detected during: 2025-05-07T20:01:44.8551313Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:44.8579818Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:44.8608336Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:44.8624625Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:44.8625805Z 2025-05-07T20:01:44.8626063Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:44.8626423Z 2025-05-07T20:01:44.8627332Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:44.8628491Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:44.8628909Z ^ 2025-05-07T20:01:44.8629174Z detected during: 2025-05-07T20:01:44.8643245Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:44.8672071Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:44.8700246Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:44.8734609Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:44.8751271Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:44.8752467Z 2025-05-07T20:01:44.8753283Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:44.8754463Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:44.8754916Z ^ 2025-05-07T20:01:44.8755210Z detected during: 2025-05-07T20:01:44.8770252Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:44.8798492Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:44.8827210Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:44.8843501Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:44.8844664Z 2025-05-07T20:01:44.8844916Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:44.8845312Z 2025-05-07T20:01:44.8846118Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:44.8847260Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:44.8847670Z ^ 2025-05-07T20:01:44.8847927Z detected during: 2025-05-07T20:01:44.8862097Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:44.8891151Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:44.8919421Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:44.8948372Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:44.8964568Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:44.8965751Z 2025-05-07T20:01:44.8966559Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:44.8967913Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:44.8968368Z ^ 2025-05-07T20:01:44.8968666Z detected during: 2025-05-07T20:01:44.8983552Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:44.9011703Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:44.9040339Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:44.9056663Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:44.9057889Z 2025-05-07T20:01:44.9058140Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:44.9058520Z 2025-05-07T20:01:44.9059326Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:44.9060445Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:44.9060871Z ^ 2025-05-07T20:01:44.9061125Z detected during: 2025-05-07T20:01:44.9075363Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:44.9104168Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:44.9132248Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:44.9160818Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:44.9177357Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:44.9178537Z 2025-05-07T20:01:46.7248364Z [134/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu.o 2025-05-07T20:01:46.7260921Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:46.7262507Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:46.7263690Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:46.7264161Z ^ 2025-05-07T20:01:46.7264336Z 2025-05-07T20:01:46.7264594Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:46.7264981Z 2025-05-07T20:01:46.7265815Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:46.7267000Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:46.7267664Z ^ 2025-05-07T20:01:46.7267840Z 2025-05-07T20:01:50.6378377Z [135/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu.o 2025-05-07T20:01:50.6391098Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:50.6392717Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.6393885Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.6394367Z ^ 2025-05-07T20:01:50.6394543Z 2025-05-07T20:01:50.6394795Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:50.6395175Z 2025-05-07T20:01:50.6395997Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.6397192Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:50.6397638Z ^ 2025-05-07T20:01:50.6397831Z 2025-05-07T20:01:50.6398629Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.6399801Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.6400243Z ^ 2025-05-07T20:01:50.6400587Z detected during: 2025-05-07T20:01:50.6415489Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:50.6443739Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:50.6472642Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:50.6488980Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:01:50.6490145Z 2025-05-07T20:01:50.6490397Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:50.6490762Z 2025-05-07T20:01:50.6491569Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.6492719Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.6493178Z ^ 2025-05-07T20:01:50.6493410Z detected during: 2025-05-07T20:01:50.6507575Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:50.6536293Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:50.6564448Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:50.6593315Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:50.6609655Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:01:50.6610831Z 2025-05-07T20:01:50.6611664Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.6612813Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.6613281Z ^ 2025-05-07T20:01:50.6613552Z detected during: 2025-05-07T20:01:50.6628469Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:50.6656752Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:50.6685514Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:50.6703359Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:01:50.6704541Z 2025-05-07T20:01:50.6704802Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:50.6705188Z 2025-05-07T20:01:50.6705993Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.6707105Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.6707502Z ^ 2025-05-07T20:01:50.6707741Z detected during: 2025-05-07T20:01:50.6721824Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:50.6750502Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:50.6778920Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:50.6807697Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:50.6823925Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:01:50.6825067Z 2025-05-07T20:01:50.6825900Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.6827045Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.6827491Z ^ 2025-05-07T20:01:50.6827739Z detected during: 2025-05-07T20:01:50.6842409Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:50.6870778Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:50.6899383Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:50.6915660Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:01:50.6916795Z 2025-05-07T20:01:50.6917035Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:50.6917404Z 2025-05-07T20:01:50.6918204Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.6919307Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.6919708Z ^ 2025-05-07T20:01:50.6919956Z detected during: 2025-05-07T20:01:50.6934053Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:50.6962786Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:50.6991160Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:50.7020485Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:50.7037699Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:01:50.7038863Z 2025-05-07T20:01:50.7039663Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.7040811Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.7041244Z ^ 2025-05-07T20:01:50.7041507Z detected during: 2025-05-07T20:01:50.7056198Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:50.7084453Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:50.7113064Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:50.7129203Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:01:50.7130359Z 2025-05-07T20:01:50.7130604Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:50.7130958Z 2025-05-07T20:01:50.7131760Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.7132858Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.7133261Z ^ 2025-05-07T20:01:50.7133479Z detected during: 2025-05-07T20:01:50.7147598Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:50.7176461Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:50.7204533Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:50.7232981Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:50.7249186Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:01:50.7250324Z 2025-05-07T20:01:50.7251135Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.7252267Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.7263122Z ^ 2025-05-07T20:01:50.7263529Z detected during: 2025-05-07T20:01:50.7278753Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:50.7307107Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:50.7335665Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:50.7351894Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:01:50.7353082Z 2025-05-07T20:01:50.7353351Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:50.7353711Z 2025-05-07T20:01:50.7354512Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.7355634Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.7356043Z ^ 2025-05-07T20:01:50.7356262Z detected during: 2025-05-07T20:01:50.7371401Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:50.7400062Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:50.7428143Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:50.7456705Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:50.7473079Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:01:50.7474283Z 2025-05-07T20:01:50.7475136Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.7476280Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.7476727Z ^ 2025-05-07T20:01:50.7476979Z detected during: 2025-05-07T20:01:50.7491771Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:50.7519876Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:50.7548347Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:50.7564688Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:01:50.7565822Z 2025-05-07T20:01:50.7566063Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:50.7566430Z 2025-05-07T20:01:50.7567231Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.7568522Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.7568917Z ^ 2025-05-07T20:01:50.7569149Z detected during: 2025-05-07T20:01:50.7583296Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:50.7611945Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:50.7640023Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:50.7668741Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:50.7685811Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:01:50.7686978Z 2025-05-07T20:01:52.9440368Z [136/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu.o 2025-05-07T20:01:52.9462245Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:52.9465044Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:52.9467054Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:52.9468092Z ^ 2025-05-07T20:01:52.9468391Z 2025-05-07T20:01:52.9468795Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:52.9469678Z 2025-05-07T20:01:52.9471259Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:52.9473321Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:52.9474083Z ^ 2025-05-07T20:01:52.9474352Z 2025-05-07T20:01:52.9475725Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:52.9477732Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:52.9478530Z ^ 2025-05-07T20:01:52.9478976Z detected during: 2025-05-07T20:01:52.9504887Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:52.9554434Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:52.9604406Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:52.9632769Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:01:52.9634999Z 2025-05-07T20:01:52.9635402Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:52.9636063Z 2025-05-07T20:01:52.9637489Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:52.9639477Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:52.9640187Z ^ 2025-05-07T20:01:52.9640583Z detected during: 2025-05-07T20:01:52.9665932Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:52.9716349Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:52.9765823Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:52.9818264Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:52.9847006Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:01:52.9849140Z 2025-05-07T20:01:52.9850585Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:52.9852704Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:52.9853474Z ^ 2025-05-07T20:01:52.9853958Z detected during: 2025-05-07T20:01:52.9880765Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:52.9931441Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:52.9981519Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:53.0010088Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:01:53.0012156Z 2025-05-07T20:01:53.0012630Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:53.0013239Z 2025-05-07T20:01:53.0014606Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:53.0016535Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:53.0017367Z ^ 2025-05-07T20:01:53.0017743Z detected during: 2025-05-07T20:01:53.0042346Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:53.0093407Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:53.0143044Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:53.0194530Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:53.0223495Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:01:53.0225625Z 2025-05-07T20:01:53.0227061Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:53.0229245Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:53.0230030Z ^ 2025-05-07T20:01:53.0230498Z detected during: 2025-05-07T20:01:53.0256600Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:53.0309792Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:53.0365304Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:53.0394518Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:01:53.0396594Z 2025-05-07T20:01:53.0397036Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:53.0397680Z 2025-05-07T20:01:53.0398987Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:53.0401145Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:53.0401844Z ^ 2025-05-07T20:01:53.0402364Z detected during: 2025-05-07T20:01:53.0427798Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:53.0479994Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:53.0530748Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:53.0582270Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:53.0612614Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:01:53.0614931Z 2025-05-07T20:01:53.0616473Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:53.0618762Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:53.0619590Z ^ 2025-05-07T20:01:53.0620015Z detected during: 2025-05-07T20:01:53.0647991Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:53.0700191Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:53.0753732Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:53.0784073Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:01:53.0786400Z 2025-05-07T20:01:53.0786955Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:53.0787589Z 2025-05-07T20:01:53.0788985Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:53.0791094Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:53.0791867Z ^ 2025-05-07T20:01:53.0792322Z detected during: 2025-05-07T20:01:53.0819266Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:53.0873660Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:53.0927612Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:53.0977770Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:53.0994269Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:01:53.0995462Z 2025-05-07T20:01:53.0996304Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:53.0997502Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:53.0997953Z ^ 2025-05-07T20:01:53.0998243Z detected during: 2025-05-07T20:01:53.1013117Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:53.1041398Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:53.1070551Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:53.1086826Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:01:53.1088014Z 2025-05-07T20:01:53.1088270Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:53.1088635Z 2025-05-07T20:01:53.1089474Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:53.1090602Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:53.1091037Z ^ 2025-05-07T20:01:53.1091267Z detected during: 2025-05-07T20:01:53.1105448Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:53.1134264Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:53.1162724Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:53.1191622Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:53.1207803Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:01:53.1208951Z 2025-05-07T20:01:53.1209769Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:53.1210903Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:53.1211354Z ^ 2025-05-07T20:01:53.1211606Z detected during: 2025-05-07T20:01:53.1226400Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:53.1254619Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:53.1283449Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:53.1300447Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:01:53.1301604Z 2025-05-07T20:01:53.1301878Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:53.1302241Z 2025-05-07T20:01:53.1303043Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:53.1304192Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:53.1304642Z ^ 2025-05-07T20:01:53.1304890Z detected during: 2025-05-07T20:01:53.1318900Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:53.1347645Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:53.1375804Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:53.1404479Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:53.1420775Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:01:53.1421932Z 2025-05-07T20:01:55.5271978Z [137/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu.o 2025-05-07T20:01:55.5286378Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:55.5288024Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.5289187Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.5289641Z ^ 2025-05-07T20:01:55.5289808Z 2025-05-07T20:01:55.5290050Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:55.5290421Z 2025-05-07T20:01:55.5291235Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.5292394Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:55.5292820Z ^ 2025-05-07T20:01:55.5292984Z 2025-05-07T20:01:55.5293780Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.5294902Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.5295342Z ^ 2025-05-07T20:01:55.5295601Z detected during: 2025-05-07T20:01:55.5310410Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.5338537Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.5366918Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.5383205Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:01:55.5384351Z 2025-05-07T20:01:55.5384610Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:55.5384962Z 2025-05-07T20:01:55.5385760Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.5386943Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.5387359Z ^ 2025-05-07T20:01:55.5387577Z detected during: 2025-05-07T20:01:55.5401585Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:55.5429978Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.5457910Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.5486452Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.5502596Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:01:55.5503760Z 2025-05-07T20:01:55.5504557Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.5505736Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.5506165Z ^ 2025-05-07T20:01:55.5506451Z detected during: 2025-05-07T20:01:55.5521075Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.5549033Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.5577598Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.5593805Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:01:55.5594998Z 2025-05-07T20:01:55.5595240Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:55.5595596Z 2025-05-07T20:01:55.5596404Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.5597501Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.5597911Z ^ 2025-05-07T20:01:55.5598123Z detected during: 2025-05-07T20:01:55.5613033Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:55.5641597Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.5669798Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.5698283Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.5714410Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:01:55.5715592Z 2025-05-07T20:01:55.5716450Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.5717601Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.5718071Z ^ 2025-05-07T20:01:55.5718345Z detected during: 2025-05-07T20:01:55.5733019Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.5760920Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.5789385Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.5805510Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:01:55.5806654Z 2025-05-07T20:01:55.5806910Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:55.5807269Z 2025-05-07T20:01:55.5808061Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.5809198Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.5809603Z ^ 2025-05-07T20:01:55.5809819Z detected during: 2025-05-07T20:01:55.5823903Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:55.5852365Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.5880384Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.5909159Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.5925280Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:01:55.5926421Z 2025-05-07T20:01:55.5927218Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.5928367Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.5928812Z ^ 2025-05-07T20:01:55.5929062Z detected during: 2025-05-07T20:01:55.5944587Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.5972700Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.6001071Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.6017240Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:01:55.6018415Z 2025-05-07T20:01:55.6018661Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:55.6019028Z 2025-05-07T20:01:55.6019831Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.6020943Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.6021339Z ^ 2025-05-07T20:01:55.6021565Z detected during: 2025-05-07T20:01:55.6035465Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:55.6063943Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.6092036Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.6120397Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.6136525Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:01:55.6137741Z 2025-05-07T20:01:55.6138534Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.6139693Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.6140131Z ^ 2025-05-07T20:01:55.6140402Z detected during: 2025-05-07T20:01:55.6155016Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.6183129Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.6211568Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.6227705Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:01:55.6228884Z 2025-05-07T20:01:55.6229128Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:55.6229481Z 2025-05-07T20:01:55.6230294Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.6231395Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.6231794Z ^ 2025-05-07T20:01:55.6232007Z detected during: 2025-05-07T20:01:55.6245932Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:55.6275423Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.6303483Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.6331871Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.6347993Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:01:55.6349153Z 2025-05-07T20:01:55.6349949Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.6351085Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.6351529Z ^ 2025-05-07T20:01:55.6351801Z detected during: 2025-05-07T20:01:55.6366455Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.6394595Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.6423007Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.6439171Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:01:55.6440321Z 2025-05-07T20:01:55.6440575Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:55.6440930Z 2025-05-07T20:01:55.6441732Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.6442845Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.6443249Z ^ 2025-05-07T20:01:55.6443466Z detected during: 2025-05-07T20:01:55.6457602Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:55.6486200Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.6514030Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.6542456Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.6558538Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:01:55.6559686Z 2025-05-07T20:01:55.9390813Z [138/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu.o 2025-05-07T20:01:55.9403506Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:55.9405125Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.9406313Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.9406770Z ^ 2025-05-07T20:01:55.9406965Z 2025-05-07T20:01:55.9407215Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:55.9407584Z 2025-05-07T20:01:55.9408413Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.9409602Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:55.9410077Z ^ 2025-05-07T20:01:55.9410250Z 2025-05-07T20:01:55.9411053Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.9412207Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.9414341Z ^ 2025-05-07T20:01:55.9414617Z detected during: 2025-05-07T20:01:55.9429476Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.9457619Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.9486425Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.9502804Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:55.9503987Z 2025-05-07T20:01:55.9504243Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:55.9504636Z 2025-05-07T20:01:55.9505438Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.9506625Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.9507033Z ^ 2025-05-07T20:01:55.9507284Z detected during: 2025-05-07T20:01:55.9521346Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:55.9550243Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.9578415Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.9607063Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.9623306Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:55.9624523Z 2025-05-07T20:01:55.9625331Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.9626505Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.9626960Z ^ 2025-05-07T20:01:55.9627248Z detected during: 2025-05-07T20:01:55.9642065Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.9670177Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.9698700Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.9714913Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:55.9716112Z 2025-05-07T20:01:55.9716386Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:55.9716748Z 2025-05-07T20:01:55.9717574Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.9718680Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.9719114Z ^ 2025-05-07T20:01:55.9719345Z detected during: 2025-05-07T20:01:55.9733403Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:55.9762890Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.9791091Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.9819755Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.9835980Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:55.9837138Z 2025-05-07T20:01:55.9837987Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.9839141Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.9839619Z ^ 2025-05-07T20:01:55.9839890Z detected during: 2025-05-07T20:01:55.9854642Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:55.9882872Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:55.9911377Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:55.9927616Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:55.9928781Z 2025-05-07T20:01:55.9929055Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:55.9929418Z 2025-05-07T20:01:55.9930220Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:55.9931360Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:55.9931773Z ^ 2025-05-07T20:01:55.9932022Z detected during: 2025-05-07T20:01:55.9946082Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:55.9974741Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:56.0002920Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:56.0031517Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:56.0047730Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:56.0048892Z 2025-05-07T20:01:56.0049696Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:56.0050865Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:56.0051336Z ^ 2025-05-07T20:01:56.0051600Z detected during: 2025-05-07T20:01:56.0066442Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:56.0095498Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:56.0124102Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:56.0140496Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:56.0141673Z 2025-05-07T20:01:56.0141925Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:56.0142307Z 2025-05-07T20:01:56.0143113Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:56.0144245Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:56.0144663Z ^ 2025-05-07T20:01:56.0144922Z detected during: 2025-05-07T20:01:56.0159160Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:56.0187878Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:56.0216194Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:56.0244742Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:56.0260953Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:56.0262128Z 2025-05-07T20:01:56.0262937Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:56.0264106Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:56.0264552Z ^ 2025-05-07T20:01:56.0264837Z detected during: 2025-05-07T20:01:56.0279712Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:56.0307922Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:56.0336427Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:56.0352611Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:56.0353786Z 2025-05-07T20:01:56.0354037Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:56.0354397Z 2025-05-07T20:01:56.0355218Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:56.0356361Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:56.0356788Z ^ 2025-05-07T20:01:56.0357026Z detected during: 2025-05-07T20:01:56.0371147Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:56.0400637Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:56.0428798Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:56.0457463Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:56.0473822Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:56.0474977Z 2025-05-07T20:01:56.0475818Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:56.0477052Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:56.0477567Z ^ 2025-05-07T20:01:56.0477838Z detected during: 2025-05-07T20:01:56.0492696Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:56.0520693Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:56.0549209Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:56.0565593Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:56.0566771Z 2025-05-07T20:01:56.0567038Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:56.0567534Z 2025-05-07T20:01:56.0568338Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:56.0569484Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:56.0569912Z ^ 2025-05-07T20:01:56.0570145Z detected during: 2025-05-07T20:01:56.0584233Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:56.0613078Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:56.0641145Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:56.0669810Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:56.0686121Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:56.0687304Z 2025-05-07T20:01:59.3129743Z [139/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu.o 2025-05-07T20:01:59.3152238Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:59.3154933Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.3156901Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.3157651Z ^ 2025-05-07T20:01:59.3157919Z 2025-05-07T20:01:59.3158362Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:59.3158993Z 2025-05-07T20:01:59.3160394Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.3162606Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:59.3163407Z ^ 2025-05-07T20:01:59.3163688Z 2025-05-07T20:01:59.3165201Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.3167251Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.3168232Z ^ 2025-05-07T20:01:59.3168675Z detected during: 2025-05-07T20:01:59.3194635Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.3244384Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.3295222Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.3323729Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:01:59.3325775Z 2025-05-07T20:01:59.3326218Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:59.3326850Z 2025-05-07T20:01:59.3328353Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.3330318Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.3331046Z ^ 2025-05-07T20:01:59.3331402Z detected during: 2025-05-07T20:01:59.3356169Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:59.3407576Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.3457614Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.3508923Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.3537469Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:01:59.3539324Z 2025-05-07T20:01:59.3540664Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.3542650Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.3543410Z ^ 2025-05-07T20:01:59.3543857Z detected during: 2025-05-07T20:01:59.3570042Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.3619916Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.3670909Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.3701231Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:01:59.3703364Z 2025-05-07T20:01:59.3703831Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:59.3704496Z 2025-05-07T20:01:59.3705909Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.3707866Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.3708567Z ^ 2025-05-07T20:01:59.3708950Z detected during: 2025-05-07T20:01:59.3733956Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:59.3785172Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.3834862Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.3898658Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.3927421Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:01:59.3929524Z 2025-05-07T20:01:59.3930943Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.3933011Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.3933780Z ^ 2025-05-07T20:01:59.3934211Z detected during: 2025-05-07T20:01:59.3960493Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.4010917Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.4060762Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.4089583Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:01:59.4091698Z 2025-05-07T20:01:59.4092127Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:59.4092768Z 2025-05-07T20:01:59.4094116Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.4096042Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.4096723Z ^ 2025-05-07T20:01:59.4097141Z detected during: 2025-05-07T20:01:59.4122223Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:59.4172934Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.4222490Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.4273231Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.4303479Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:01:59.4305472Z 2025-05-07T20:01:59.4306893Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.4308887Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.4309649Z ^ 2025-05-07T20:01:59.4310069Z detected during: 2025-05-07T20:01:59.4335948Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.4386172Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.4437033Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.4465967Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:01:59.4468175Z 2025-05-07T20:01:59.4468601Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:59.4469225Z 2025-05-07T20:01:59.4470644Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.4472635Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.4473320Z ^ 2025-05-07T20:01:59.4473685Z detected during: 2025-05-07T20:01:59.4499236Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:59.4548784Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.4598483Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.4649094Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.4673784Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:01:59.4674956Z 2025-05-07T20:01:59.4675773Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.4677168Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.4677613Z ^ 2025-05-07T20:01:59.4677887Z detected during: 2025-05-07T20:01:59.4692547Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.4720623Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.4748987Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.4765195Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:01:59.4766343Z 2025-05-07T20:01:59.4766589Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:59.4766943Z 2025-05-07T20:01:59.4767925Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.4769133Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.4769577Z ^ 2025-05-07T20:01:59.4769836Z detected during: 2025-05-07T20:01:59.4783835Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:59.4812256Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.4841083Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.4869738Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.4885872Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:01:59.4887058Z 2025-05-07T20:01:59.4887870Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.4889040Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.4889526Z ^ 2025-05-07T20:01:59.4889800Z detected during: 2025-05-07T20:01:59.4904529Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.4932454Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.4961000Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.4977340Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:01:59.4978569Z 2025-05-07T20:01:59.4978854Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:59.4979238Z 2025-05-07T20:01:59.4980046Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.4981188Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.4981605Z ^ 2025-05-07T20:01:59.4981865Z detected during: 2025-05-07T20:01:59.4995905Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:59.5024430Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.5052478Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.5081054Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.5097325Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:01:59.5098478Z 2025-05-07T20:01:59.7080985Z [140/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu.o 2025-05-07T20:01:59.7093729Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:59.7095336Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.7096486Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.7096961Z ^ 2025-05-07T20:01:59.7097229Z 2025-05-07T20:01:59.7097482Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:59.7097939Z 2025-05-07T20:01:59.7098771Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.7099969Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:59.7100417Z ^ 2025-05-07T20:01:59.7100613Z 2025-05-07T20:01:59.7101411Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.7102573Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.7103017Z ^ 2025-05-07T20:01:59.7103338Z detected during: 2025-05-07T20:01:59.7118171Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.7146350Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.7175174Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.7191628Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:59.7192814Z 2025-05-07T20:01:59.7193065Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:59.7193432Z 2025-05-07T20:01:59.7194272Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.7195390Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.7195816Z ^ 2025-05-07T20:01:59.7196051Z detected during: 2025-05-07T20:01:59.7210163Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:59.7238999Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.7267116Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.7297537Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.7313951Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:59.7315113Z 2025-05-07T20:01:59.7315939Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.7317095Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.7317566Z ^ 2025-05-07T20:01:59.7317835Z detected during: 2025-05-07T20:01:59.7332691Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.7360846Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.7389639Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.7406029Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:59.7407183Z 2025-05-07T20:01:59.7407463Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:59.7407832Z 2025-05-07T20:01:59.7408642Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.7409792Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.7410201Z ^ 2025-05-07T20:01:59.7410459Z detected during: 2025-05-07T20:01:59.7424580Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:59.7453192Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.7481560Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.7513105Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.7529474Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:59.7530651Z 2025-05-07T20:01:59.7531458Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.7532634Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.7533090Z ^ 2025-05-07T20:01:59.7533378Z detected during: 2025-05-07T20:01:59.7548184Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.7576490Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.7605809Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.7622110Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:59.7623277Z 2025-05-07T20:01:59.7623532Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:59.7623897Z 2025-05-07T20:01:59.7624702Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.7625837Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.7626266Z ^ 2025-05-07T20:01:59.7626500Z detected during: 2025-05-07T20:01:59.7640507Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:59.7669442Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.7697620Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.7726235Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.7742477Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:59.7743614Z 2025-05-07T20:01:59.7744445Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.7745621Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.7746094Z ^ 2025-05-07T20:01:59.7746354Z detected during: 2025-05-07T20:01:59.7761221Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.7789553Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.7818236Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.7834485Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:59.7835623Z 2025-05-07T20:01:59.7835879Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:59.7836234Z 2025-05-07T20:01:59.7837035Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.7838169Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.7838558Z ^ 2025-05-07T20:01:59.7838786Z detected during: 2025-05-07T20:01:59.7852882Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:59.7881731Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.7909854Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.7939501Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.7955786Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:59.7956959Z 2025-05-07T20:01:59.7957763Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.7958920Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.7959401Z ^ 2025-05-07T20:01:59.7959674Z detected during: 2025-05-07T20:01:59.7974709Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.8003064Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.8031646Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.8047869Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:59.8049036Z 2025-05-07T20:01:59.8049279Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:59.8049652Z 2025-05-07T20:01:59.8050477Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.8051592Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.8051985Z ^ 2025-05-07T20:01:59.8052211Z detected during: 2025-05-07T20:01:59.8066306Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:59.8095015Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.8123035Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.8151567Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.8167932Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:59.8169075Z 2025-05-07T20:01:59.8169863Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.8171009Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.8171441Z ^ 2025-05-07T20:01:59.8171707Z detected during: 2025-05-07T20:01:59.8186474Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.8214525Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.8243233Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.8260215Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:59.8261401Z 2025-05-07T20:01:59.8261657Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:59.8262049Z 2025-05-07T20:01:59.8262855Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.8263986Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.8264423Z ^ 2025-05-07T20:01:59.8264689Z detected during: 2025-05-07T20:01:59.8278968Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:59.8307698Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:59.8335989Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:59.8364837Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:59.8381212Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:59.8382395Z 2025-05-07T20:02:09.1781385Z [141/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu.o 2025-05-07T20:02:09.1793963Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:09.1795625Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:09.1796770Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:09.1797231Z ^ 2025-05-07T20:02:09.1797396Z 2025-05-07T20:02:09.1797640Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:09.1798013Z 2025-05-07T20:02:09.1798832Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:09.1799991Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:09.1800416Z ^ 2025-05-07T20:02:09.1800594Z 2025-05-07T20:02:09.1801385Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:09.1802521Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:09.1802952Z ^ 2025-05-07T20:02:09.1803212Z detected during: 2025-05-07T20:02:09.1817907Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:09.1847123Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:09.1875776Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:09.1891837Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:09.1893012Z 2025-05-07T20:02:09.1893253Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:09.1893609Z 2025-05-07T20:02:09.1894408Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:09.1895522Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:09.1895924Z ^ 2025-05-07T20:02:09.1896139Z detected during: 2025-05-07T20:02:09.1910205Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:09.1938819Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:09.1966714Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:09.1995111Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:09.2011197Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:09.2012336Z 2025-05-07T20:02:09.2013145Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:09.2014276Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:09.2014722Z ^ 2025-05-07T20:02:09.2014968Z detected during: 2025-05-07T20:02:09.2029627Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:09.2057573Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:09.2086135Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:09.2102295Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:09.2103437Z 2025-05-07T20:02:09.2103678Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:09.2104053Z 2025-05-07T20:02:09.2104849Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:09.2105963Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:09.2106388Z ^ 2025-05-07T20:02:09.2106616Z detected during: 2025-05-07T20:02:09.2120618Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:09.2149161Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:09.2178004Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:09.2206230Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:09.2222428Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:09.2223575Z 2025-05-07T20:02:09.2224366Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:09.2225497Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:09.2225932Z ^ 2025-05-07T20:02:09.2226192Z detected during: 2025-05-07T20:02:09.2240918Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:09.2268982Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:09.2297324Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:09.2313497Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:09.2314649Z 2025-05-07T20:02:09.2314888Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:09.2315242Z 2025-05-07T20:02:09.2316046Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:09.2317170Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:09.2317599Z ^ 2025-05-07T20:02:09.2317832Z detected during: 2025-05-07T20:02:09.2331769Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:09.2360191Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:09.2388335Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:09.2416636Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:09.2432720Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:09.2433868Z 2025-05-07T20:02:09.2435100Z ptxas /tmp/tmpxft_00008c98_00000000-9_f4f4bf16_128_256_2_1_1_t.compute_90.ptx, line 835; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:09.2437707Z ptxas /tmp/tmpxft_00008c98_00000000-9_f4f4bf16_128_256_2_1_1_t.compute_90.ptx, line 848; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:09.2440267Z ptxas /tmp/tmpxft_00008c98_00000000-9_f4f4bf16_128_256_2_1_1_t.compute_90.ptx, line 988; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:09.2442809Z ptxas /tmp/tmpxft_00008c98_00000000-9_f4f4bf16_128_256_2_1_1_t.compute_90.ptx, line 1001; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:09.2444925Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:09.2446068Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:09.2446500Z ^ 2025-05-07T20:02:09.2446791Z detected during: 2025-05-07T20:02:09.2461481Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:09.2490156Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:09.2518494Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:09.2534674Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:09.2535824Z 2025-05-07T20:02:09.2536064Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:09.2536459Z 2025-05-07T20:02:09.2537294Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:09.2538410Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:09.2538806Z ^ 2025-05-07T20:02:09.2539043Z detected during: 2025-05-07T20:02:09.2552962Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:09.2581508Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:09.2609484Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:09.2637937Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:09.2654075Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:09.2655233Z 2025-05-07T20:02:09.2656027Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:09.2657223Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:09.2657705Z ^ 2025-05-07T20:02:09.2657968Z detected during: 2025-05-07T20:02:09.2672709Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:09.2700781Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:09.2728857Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:09.2745050Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:09.2746199Z 2025-05-07T20:02:09.2746439Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:09.2746793Z 2025-05-07T20:02:09.2747601Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:09.2748697Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:09.2749129Z ^ 2025-05-07T20:02:09.2749341Z detected during: 2025-05-07T20:02:09.2763247Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:09.2791799Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:09.2820204Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:09.2848450Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:09.2864620Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:09.2865757Z 2025-05-07T20:02:09.2866603Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:09.2867880Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:09.2868329Z ^ 2025-05-07T20:02:09.2868577Z detected during: 2025-05-07T20:02:09.2883206Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:09.2911027Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:09.2939137Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:09.2955154Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:09.2956291Z 2025-05-07T20:02:09.2956545Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:09.2956895Z 2025-05-07T20:02:09.2957689Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:09.2958824Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:09.2959218Z ^ 2025-05-07T20:02:09.2959444Z detected during: 2025-05-07T20:02:09.2973504Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:09.3001875Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:09.3029820Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:09.3057994Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:09.3074223Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:09.3075426Z 2025-05-07T20:02:10.0532009Z [142/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu.o 2025-05-07T20:02:10.0544813Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:10.0546419Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.0547573Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.0548042Z ^ 2025-05-07T20:02:10.0548213Z 2025-05-07T20:02:10.0548464Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.0548847Z 2025-05-07T20:02:10.0549677Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.0550864Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:10.0551399Z ^ 2025-05-07T20:02:10.0551597Z 2025-05-07T20:02:10.0553788Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.0555011Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.0555462Z ^ 2025-05-07T20:02:10.0555753Z detected during: 2025-05-07T20:02:10.0570594Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.0598596Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.0627087Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.0643526Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:10.0644696Z 2025-05-07T20:02:10.0644945Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.0645332Z 2025-05-07T20:02:10.0646137Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.0647270Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.0647723Z ^ 2025-05-07T20:02:10.0647955Z detected during: 2025-05-07T20:02:10.0661915Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:10.0690553Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.0718371Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.0746890Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.0763040Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:10.0764223Z 2025-05-07T20:02:10.0765048Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.0766203Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.0766672Z ^ 2025-05-07T20:02:10.0766937Z detected during: 2025-05-07T20:02:10.0781778Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.0809644Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.0837915Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.0854056Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:10.0855244Z 2025-05-07T20:02:10.0855491Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.0855876Z 2025-05-07T20:02:10.0856710Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.0857891Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.0858295Z ^ 2025-05-07T20:02:10.0858548Z detected during: 2025-05-07T20:02:10.0872551Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:10.0901956Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.0930006Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.0958536Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.0974836Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:10.0976020Z 2025-05-07T20:02:10.0976825Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.0978052Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.0978513Z ^ 2025-05-07T20:02:10.0978814Z detected during: 2025-05-07T20:02:10.0993427Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.1021463Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.1049690Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.1065750Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:10.1066939Z 2025-05-07T20:02:10.1067182Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.1067675Z 2025-05-07T20:02:10.1068480Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.1069577Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.1069978Z ^ 2025-05-07T20:02:10.1070194Z detected during: 2025-05-07T20:02:10.1084127Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:10.1112584Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.1140482Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.1168902Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.1185043Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:10.1186185Z 2025-05-07T20:02:10.1187427Z ptxas /tmp/tmpxft_00008c96_00000000-9_f4f4bf16_128_256_2_1_1_f.compute_90.ptx, line 835; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:10.1189972Z ptxas /tmp/tmpxft_00008c96_00000000-9_f4f4bf16_128_256_2_1_1_f.compute_90.ptx, line 848; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:10.1192505Z ptxas /tmp/tmpxft_00008c96_00000000-9_f4f4bf16_128_256_2_1_1_f.compute_90.ptx, line 988; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:10.1195051Z ptxas /tmp/tmpxft_00008c96_00000000-9_f4f4bf16_128_256_2_1_1_f.compute_90.ptx, line 1001; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:10.1197202Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.1198355Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.1198800Z ^ 2025-05-07T20:02:10.1199064Z detected during: 2025-05-07T20:02:10.1214303Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.1242180Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.1270660Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.1286941Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:10.1288093Z 2025-05-07T20:02:10.1288371Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.1288737Z 2025-05-07T20:02:10.1289546Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.1290674Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.1291077Z ^ 2025-05-07T20:02:10.1291293Z detected during: 2025-05-07T20:02:10.1305267Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:10.1333810Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.1361697Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.1390085Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.1406140Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:10.1407310Z 2025-05-07T20:02:10.1408129Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.1409275Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.1409718Z ^ 2025-05-07T20:02:10.1409983Z detected during: 2025-05-07T20:02:10.1424668Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.1452349Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.1480706Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.1497099Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:10.1498244Z 2025-05-07T20:02:10.1498485Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.1498849Z 2025-05-07T20:02:10.1499649Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.1500758Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.1501157Z ^ 2025-05-07T20:02:10.1501396Z detected during: 2025-05-07T20:02:10.1515277Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:10.1544148Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.1571976Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.1600229Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.1616386Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:10.1617580Z 2025-05-07T20:02:10.1618375Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.1619519Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.1619951Z ^ 2025-05-07T20:02:10.1620215Z detected during: 2025-05-07T20:02:10.1634832Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.1662661Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.1690952Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.1707001Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:10.1708165Z 2025-05-07T20:02:10.1708406Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.1708759Z 2025-05-07T20:02:10.1709566Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.1710662Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.1711059Z ^ 2025-05-07T20:02:10.1711271Z detected during: 2025-05-07T20:02:10.1725137Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:10.1764006Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.1791987Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.1820217Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.1836358Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:10.1837509Z 2025-05-07T20:02:13.4223020Z [143/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu.o 2025-05-07T20:02:13.4235520Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:13.4237099Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:13.4238233Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:13.4238755Z ^ 2025-05-07T20:02:13.4238923Z 2025-05-07T20:02:13.4239170Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:13.4239540Z 2025-05-07T20:02:13.4240430Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:13.4241587Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:13.4242015Z ^ 2025-05-07T20:02:13.4242198Z 2025-05-07T20:02:13.4242993Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:13.4244147Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:13.4244582Z ^ 2025-05-07T20:02:13.4244842Z detected during: 2025-05-07T20:02:13.4261041Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:13.4289909Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:13.4318881Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:13.4335418Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:02:13.4336576Z 2025-05-07T20:02:13.4336818Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:13.4337214Z 2025-05-07T20:02:13.4338027Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:13.4339154Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:13.4339603Z ^ 2025-05-07T20:02:13.4339858Z detected during: 2025-05-07T20:02:13.4354922Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:13.4383586Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:13.4412654Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:13.4429076Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:02:13.4430256Z 2025-05-07T20:02:13.4430497Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:13.4430865Z 2025-05-07T20:02:13.4431688Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:13.4432837Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:13.4433270Z ^ 2025-05-07T20:02:13.4433534Z detected during: 2025-05-07T20:02:13.4448604Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:13.4477295Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:13.4506310Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:13.4522799Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:02:13.4523996Z 2025-05-07T20:02:13.4524324Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:13.4524678Z 2025-05-07T20:02:13.4525493Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:13.4526618Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:13.4527069Z ^ 2025-05-07T20:02:13.4527322Z detected during: 2025-05-07T20:02:13.4542335Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:13.4570935Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:13.4600671Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:13.4617235Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:02:13.4618390Z 2025-05-07T20:02:13.4618658Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:13.4619017Z 2025-05-07T20:02:13.4619819Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:13.4620979Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:13.4621427Z ^ 2025-05-07T20:02:13.4621710Z detected during: 2025-05-07T20:02:13.4636671Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:13.4665355Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:13.4694512Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:13.4711009Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:02:13.4712195Z 2025-05-07T20:02:13.4712444Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:13.4712840Z 2025-05-07T20:02:13.4713653Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:13.4714825Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:13.4715327Z ^ 2025-05-07T20:02:13.4715615Z detected during: 2025-05-07T20:02:13.4730678Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:13.4759153Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:13.4788343Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:13.4804785Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:02:13.4805973Z 2025-05-07T20:02:13.4806243Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:13.4806606Z 2025-05-07T20:02:16.5107295Z [144/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu.o 2025-05-07T20:02:16.5119892Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:16.5121444Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:16.5122599Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:16.5123045Z ^ 2025-05-07T20:02:16.5123212Z 2025-05-07T20:02:16.5123455Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:16.5123820Z 2025-05-07T20:02:16.5124637Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:16.5125805Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:16.5126303Z ^ 2025-05-07T20:02:16.5126468Z 2025-05-07T20:02:16.5127272Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:16.5128432Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:16.5128870Z ^ 2025-05-07T20:02:16.5129127Z detected during: 2025-05-07T20:02:16.5144265Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:16.5174545Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:16.5203765Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:16.5220318Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:02:16.5221465Z 2025-05-07T20:02:16.5221725Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:16.5222078Z 2025-05-07T20:02:16.5222873Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:16.5224015Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:16.5224446Z ^ 2025-05-07T20:02:16.5224712Z detected during: 2025-05-07T20:02:16.5239592Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:16.5268148Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:16.5297543Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:16.5313970Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:02:16.5315092Z 2025-05-07T20:02:16.5315329Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:16.5315686Z 2025-05-07T20:02:16.5316466Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:16.5317746Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:16.5318189Z ^ 2025-05-07T20:02:16.5318451Z detected during: 2025-05-07T20:02:16.5333444Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:16.5361901Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:16.5391086Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:16.5407229Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:02:16.5408352Z 2025-05-07T20:02:16.5408600Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:16.5408946Z 2025-05-07T20:02:16.5409729Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:16.5410865Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:16.5411295Z ^ 2025-05-07T20:02:16.5411538Z detected during: 2025-05-07T20:02:16.5426607Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:16.5455435Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:16.5485327Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:16.5501691Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:02:16.5502839Z 2025-05-07T20:02:16.5503082Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:16.5503446Z 2025-05-07T20:02:16.5504241Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:16.5505377Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:16.5505809Z ^ 2025-05-07T20:02:16.5506069Z detected during: 2025-05-07T20:02:16.5520987Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:16.5549261Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:16.5578576Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:16.5594935Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:02:16.5596065Z 2025-05-07T20:02:16.5596302Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:16.5596652Z 2025-05-07T20:02:16.5597429Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:16.5598549Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:16.5598982Z ^ 2025-05-07T20:02:16.5599227Z detected during: 2025-05-07T20:02:16.5613899Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:16.5642460Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:16.5671598Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:16.5688225Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:02:16.5689371Z 2025-05-07T20:02:16.5689610Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:16.5689973Z 2025-05-07T20:02:16.8427510Z [145/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu.o 2025-05-07T20:02:16.8440147Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:16.8441734Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:16.8442913Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:16.8443398Z ^ 2025-05-07T20:02:16.8443575Z 2025-05-07T20:02:16.8443831Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:16.8444196Z 2025-05-07T20:02:16.8445045Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:16.8446220Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:16.8446690Z ^ 2025-05-07T20:02:16.8446860Z 2025-05-07T20:02:16.8447678Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:16.8448803Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:16.8449245Z ^ 2025-05-07T20:02:16.8449492Z detected during: 2025-05-07T20:02:16.8464748Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:16.8493725Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:16.8522638Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:16.8540393Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:02:16.8541616Z 2025-05-07T20:02:16.8541889Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:16.8542254Z 2025-05-07T20:02:16.8543058Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:16.8544242Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:16.8544701Z ^ 2025-05-07T20:02:16.8544994Z detected during: 2025-05-07T20:02:16.8559949Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:16.8588802Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:16.8617299Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:16.8633941Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:02:16.8635068Z 2025-05-07T20:02:16.8635300Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:16.8635646Z 2025-05-07T20:02:16.8636439Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:16.8637541Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:16.8637977Z ^ 2025-05-07T20:02:16.8638218Z detected during: 2025-05-07T20:02:16.8652787Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:16.8681961Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:16.8710965Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:16.8727052Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:02:16.8728176Z 2025-05-07T20:02:16.8728439Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:16.8728794Z 2025-05-07T20:02:16.8729576Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:16.8730714Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:16.8731151Z ^ 2025-05-07T20:02:16.8731430Z detected during: 2025-05-07T20:02:16.8746520Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:16.8775173Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:16.8804367Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:16.8820985Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:02:16.8822200Z 2025-05-07T20:02:16.8822451Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:16.8822833Z 2025-05-07T20:02:16.8823659Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:16.8824813Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:16.8825285Z ^ 2025-05-07T20:02:16.8825580Z detected during: 2025-05-07T20:02:16.8840395Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:16.8869509Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:16.8898403Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:16.8914846Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:02:16.8915970Z 2025-05-07T20:02:16.8916234Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:16.8916589Z 2025-05-07T20:02:16.8917375Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:16.8918511Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:16.8918954Z ^ 2025-05-07T20:02:16.8919263Z detected during: 2025-05-07T20:02:16.8934041Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:16.8962937Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:16.8992335Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:16.9008469Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:02:16.9009616Z 2025-05-07T20:02:16.9009858Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:16.9010209Z 2025-05-07T20:02:17.4714371Z [146/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu.o 2025-05-07T20:02:17.4726992Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:17.4728557Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:17.4729703Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:17.4730152Z ^ 2025-05-07T20:02:17.4730322Z 2025-05-07T20:02:17.4730588Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:17.4730940Z 2025-05-07T20:02:17.4731746Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:17.4732906Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:17.4733357Z ^ 2025-05-07T20:02:17.4733528Z 2025-05-07T20:02:17.4734305Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:17.4735437Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:17.4735878Z ^ 2025-05-07T20:02:17.4736160Z detected during: 2025-05-07T20:02:17.4751600Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:17.4780263Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:17.4809012Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:17.4825377Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:02:17.4826557Z 2025-05-07T20:02:17.4826813Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:17.4827179Z 2025-05-07T20:02:17.4828019Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:17.4829202Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:17.4829805Z ^ 2025-05-07T20:02:17.4830076Z detected during: 2025-05-07T20:02:17.4844762Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:17.4874462Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:17.4903412Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:17.4919673Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:02:17.4920815Z 2025-05-07T20:02:17.4921082Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:17.4921435Z 2025-05-07T20:02:17.4922250Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:17.4923382Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:17.4923811Z ^ 2025-05-07T20:02:17.4924078Z detected during: 2025-05-07T20:02:17.4938872Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:17.4967205Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:17.4996470Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:17.5012426Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:02:17.5013573Z 2025-05-07T20:02:17.5013820Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:17.5014199Z 2025-05-07T20:02:17.5014993Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:17.5016118Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:17.5016577Z ^ 2025-05-07T20:02:17.5016858Z detected during: 2025-05-07T20:02:17.5032089Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:17.5060027Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:17.5089119Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:17.5105524Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:02:17.5106671Z 2025-05-07T20:02:17.5106942Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:17.5107307Z 2025-05-07T20:02:17.5108107Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:17.5109280Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:17.5109877Z ^ 2025-05-07T20:02:17.5110136Z detected during: 2025-05-07T20:02:17.5125077Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:17.5153675Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:17.5182682Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:17.5199802Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:02:17.5200925Z 2025-05-07T20:02:17.5201172Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:17.5201546Z 2025-05-07T20:02:17.5202325Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:17.5203490Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:17.5203927Z ^ 2025-05-07T20:02:17.5204239Z detected during: 2025-05-07T20:02:17.5219176Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:17.5247695Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:17.5276989Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:17.5293430Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:02:17.5294574Z 2025-05-07T20:02:17.5294815Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:17.5295165Z 2025-05-07T20:02:18.1312714Z [147/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu.o 2025-05-07T20:02:18.1325221Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:18.1326780Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:18.1327907Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:18.1328451Z ^ 2025-05-07T20:02:18.1328626Z 2025-05-07T20:02:18.1328896Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:18.1329250Z 2025-05-07T20:02:18.1330132Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:18.1331297Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:18.1331726Z ^ 2025-05-07T20:02:18.1331916Z 2025-05-07T20:02:18.1332693Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:18.1333816Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:18.1334251Z ^ 2025-05-07T20:02:18.1334522Z detected during: 2025-05-07T20:02:18.1349917Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:18.1378447Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:18.1407188Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:18.1423799Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:02:18.1424976Z 2025-05-07T20:02:18.1425223Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:18.1425584Z 2025-05-07T20:02:18.1426414Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:18.1427566Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:18.1428043Z ^ 2025-05-07T20:02:18.1428314Z detected during: 2025-05-07T20:02:18.1443250Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:18.1474872Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:18.1503889Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:18.1520076Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:02:18.1521218Z 2025-05-07T20:02:18.1521463Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:18.1521820Z 2025-05-07T20:02:18.1522630Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:18.1523755Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:18.1524213Z ^ 2025-05-07T20:02:18.1524476Z detected during: 2025-05-07T20:02:18.1540630Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:18.1569265Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:18.1598192Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:18.1614163Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:02:18.1615294Z 2025-05-07T20:02:18.1615527Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:18.1615885Z 2025-05-07T20:02:18.1617170Z ptxas /tmp/tmpxft_00008c92_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 889; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:18.1619885Z ptxas /tmp/tmpxft_00008c92_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 896; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:18.1622431Z ptxas /tmp/tmpxft_00008c92_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 903; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:18.1624950Z ptxas /tmp/tmpxft_00008c92_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 910; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:18.1627497Z ptxas /tmp/tmpxft_00008c92_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 1044; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:18.1630135Z ptxas /tmp/tmpxft_00008c92_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 1051; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:18.1632667Z ptxas /tmp/tmpxft_00008c92_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 1058; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:18.1635133Z ptxas /tmp/tmpxft_00008c92_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 1065; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:18.1637193Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:18.1638305Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:18.1638738Z ^ 2025-05-07T20:02:18.1638983Z detected during: 2025-05-07T20:02:18.1653638Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:18.1682510Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:18.1711415Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:18.1727336Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:02:18.1728452Z 2025-05-07T20:02:18.1728692Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:18.1729049Z 2025-05-07T20:02:18.1729829Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:18.1730937Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:18.1731356Z ^ 2025-05-07T20:02:18.1731653Z detected during: 2025-05-07T20:02:18.1746605Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:18.1774997Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:18.1803795Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:18.1820007Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:02:18.1821176Z 2025-05-07T20:02:18.1821424Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:18.1821778Z 2025-05-07T20:02:18.1822614Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:18.1823758Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:18.1824197Z ^ 2025-05-07T20:02:18.1824442Z detected during: 2025-05-07T20:02:18.1839281Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:18.1867985Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:18.1896561Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:18.1913251Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:02:18.1914358Z 2025-05-07T20:02:18.1914591Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:18.1914952Z 2025-05-07T20:02:19.1052305Z [148/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16.cu.o 2025-05-07T20:02:19.1064917Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:19.1066496Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:19.1067804Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:19.1068245Z ^ 2025-05-07T20:02:19.1068427Z 2025-05-07T20:02:19.1068672Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:19.1069113Z 2025-05-07T20:02:19.1069947Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:19.1071166Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:19.1071602Z ^ 2025-05-07T20:02:19.1071766Z 2025-05-07T20:02:19.1072761Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:19.1074059Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:19.1074550Z ^ 2025-05-07T20:02:19.1074770Z 2025-05-07T20:02:19.1075716Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:19.1077007Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:19.1077502Z ^ 2025-05-07T20:02:19.1077735Z 2025-05-07T20:02:19.1078685Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:19.1080110Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:19.1080580Z ^ 2025-05-07T20:02:19.1080795Z 2025-05-07T20:02:19.1081029Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:19.1081389Z 2025-05-07T20:02:19.1082329Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:19.1083584Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:19.1084045Z ^ 2025-05-07T20:02:19.1084287Z 2025-05-07T20:02:19.1085201Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:19.1086456Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:19.1086914Z ^ 2025-05-07T20:02:19.1087137Z 2025-05-07T20:02:19.1087372Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:19.1087713Z 2025-05-07T20:02:19.1088628Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:19.1089866Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:19.1090345Z ^ 2025-05-07T20:02:19.1090573Z 2025-05-07T20:02:19.1091494Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:19.1092775Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:19.1093238Z ^ 2025-05-07T20:02:19.1093449Z 2025-05-07T20:02:19.1093709Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:19.1094061Z 2025-05-07T20:02:19.1094960Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:19.1096212Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:19.1096671Z ^ 2025-05-07T20:02:19.1096910Z 2025-05-07T20:02:19.1098066Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:19.1099368Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:19.1099834Z ^ 2025-05-07T20:02:19.1100064Z 2025-05-07T20:02:19.1100301Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:19.1100650Z 2025-05-07T20:02:19.1101574Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:19.1102885Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:19.1103371Z ^ 2025-05-07T20:02:19.1103603Z 2025-05-07T20:02:19.1104565Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:19.1105857Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:19.1106331Z ^ 2025-05-07T20:02:19.1106546Z 2025-05-07T20:02:19.1106784Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:19.1107145Z 2025-05-07T20:02:19.1108067Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:19.1109355Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:19.1109935Z ^ 2025-05-07T20:02:19.1110174Z 2025-05-07T20:02:19.1111078Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:19.1112333Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:19.1112792Z ^ 2025-05-07T20:02:19.1113002Z 2025-05-07T20:02:19.1113249Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:19.1113617Z 2025-05-07T20:02:19.1114516Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:19.1115768Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:19.1116267Z ^ 2025-05-07T20:02:19.1116496Z 2025-05-07T20:02:22.5851414Z [149/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu.o 2025-05-07T20:02:22.5864335Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:22.5865908Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.5867060Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:22.5867666Z ^ 2025-05-07T20:02:22.5867850Z 2025-05-07T20:02:22.5868093Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:22.5868453Z 2025-05-07T20:02:22.5869279Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.5870522Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:22.5870959Z ^ 2025-05-07T20:02:22.5871124Z 2025-05-07T20:02:22.5873228Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.5874392Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:22.5874839Z ^ 2025-05-07T20:02:22.5875085Z detected during: 2025-05-07T20:02:22.5890065Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:22.5919019Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:22.5948162Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:22.5964676Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:02:22.5965834Z 2025-05-07T20:02:22.5966082Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:22.5966441Z 2025-05-07T20:02:22.5967255Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.5968517Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:22.5968967Z ^ 2025-05-07T20:02:22.5969233Z detected during: 2025-05-07T20:02:22.5984348Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:22.6013124Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:22.6041993Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:22.6058372Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:02:22.6059512Z 2025-05-07T20:02:22.6059772Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:22.6060129Z 2025-05-07T20:02:22.6060927Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.6062099Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:22.6062542Z ^ 2025-05-07T20:02:22.6062794Z detected during: 2025-05-07T20:02:22.6077967Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:22.6106511Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:22.6134996Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:22.6151666Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:02:22.6152802Z 2025-05-07T20:02:22.6153041Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:22.6153397Z 2025-05-07T20:02:22.6154200Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.6155309Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:22.6155723Z ^ 2025-05-07T20:02:22.6155976Z detected during: 2025-05-07T20:02:22.6170916Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:22.6199945Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:22.6228596Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:22.6244868Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:02:22.6245990Z 2025-05-07T20:02:22.6246227Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:22.6246581Z 2025-05-07T20:02:22.6247379Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.6248487Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:22.6248918Z ^ 2025-05-07T20:02:22.6249160Z detected during: 2025-05-07T20:02:22.6264202Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:22.6293551Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:22.6322627Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:22.6338844Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:02:22.6339989Z 2025-05-07T20:02:22.6340232Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:22.6340597Z 2025-05-07T20:02:22.6341396Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.6342534Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:22.6342991Z ^ 2025-05-07T20:02:22.6343249Z detected during: 2025-05-07T20:02:22.6358205Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:22.6387018Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:22.6415577Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:22.6432268Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:02:22.6433387Z 2025-05-07T20:02:22.6433621Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:22.6433967Z 2025-05-07T20:02:22.6542273Z [150/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu.o 2025-05-07T20:02:22.6554732Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:22.6556270Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.6557385Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:22.6557813Z ^ 2025-05-07T20:02:22.6557978Z 2025-05-07T20:02:22.6558227Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:22.6558573Z 2025-05-07T20:02:22.6559362Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.6560483Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:22.6560910Z ^ 2025-05-07T20:02:22.6561071Z 2025-05-07T20:02:22.6561840Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.6562997Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:22.6563413Z ^ 2025-05-07T20:02:22.6563673Z detected during: 2025-05-07T20:02:22.6579022Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:22.6607131Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:22.6636100Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:22.6652308Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:02:22.6653425Z 2025-05-07T20:02:22.6653663Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:22.6654009Z 2025-05-07T20:02:22.6654802Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.6655907Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:22.6656344Z ^ 2025-05-07T20:02:22.6656589Z detected during: 2025-05-07T20:02:22.6671936Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:22.6700288Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:22.6728904Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:22.6745339Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:02:22.6746479Z 2025-05-07T20:02:22.6746737Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:22.6747092Z 2025-05-07T20:02:22.6747888Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.6749035Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:22.6749463Z ^ 2025-05-07T20:02:22.6749834Z detected during: 2025-05-07T20:02:22.6764501Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:22.6793344Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:22.6822040Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:22.6838495Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:02:22.6839611Z 2025-05-07T20:02:22.6839847Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:22.6849772Z 2025-05-07T20:02:22.6850641Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.6851814Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:22.6852344Z ^ 2025-05-07T20:02:22.6852639Z detected during: 2025-05-07T20:02:22.6868040Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:22.6896301Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:22.6925442Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:22.6942191Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:02:22.6943343Z 2025-05-07T20:02:22.6943614Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:22.6944006Z 2025-05-07T20:02:22.6944817Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.6945981Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:22.6946451Z ^ 2025-05-07T20:02:22.6946724Z detected during: 2025-05-07T20:02:22.6961534Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:22.6990466Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:22.7020067Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:22.7036695Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:02:22.7037817Z 2025-05-07T20:02:22.7038061Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:22.7038439Z 2025-05-07T20:02:22.7039225Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.7040366Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:22.7040807Z ^ 2025-05-07T20:02:22.7041094Z detected during: 2025-05-07T20:02:22.7055772Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:22.7084567Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:22.7113620Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:22.7129809Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:02:22.7130947Z 2025-05-07T20:02:22.7131191Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:22.7131544Z 2025-05-07T20:02:23.7113070Z [151/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu.o 2025-05-07T20:02:23.7125869Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:23.7127434Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.7128584Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:23.7129026Z ^ 2025-05-07T20:02:23.7129226Z 2025-05-07T20:02:23.7129471Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:23.7129829Z 2025-05-07T20:02:23.7130648Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.7131787Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:23.7132244Z ^ 2025-05-07T20:02:23.7132416Z 2025-05-07T20:02:23.7133202Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.7134345Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:23.7134809Z ^ 2025-05-07T20:02:23.7135170Z detected during: 2025-05-07T20:02:23.7150393Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:23.7177940Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:23.7205856Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:23.7221616Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:02:23.7222818Z 2025-05-07T20:02:23.7223072Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:23.7223460Z 2025-05-07T20:02:23.7224267Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.7225446Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:23.7225918Z ^ 2025-05-07T20:02:23.7226213Z detected during: 2025-05-07T20:02:23.7240419Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:23.7267841Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:23.7295667Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:23.7311991Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:02:23.7313083Z 2025-05-07T20:02:23.7313348Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:23.7313676Z 2025-05-07T20:02:23.7314408Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.7315465Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:23.7315877Z ^ 2025-05-07T20:02:23.7316108Z detected during: 2025-05-07T20:02:23.7329955Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:23.7357449Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:23.7386863Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:23.7402367Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:02:23.7403439Z 2025-05-07T20:02:23.7403665Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:23.7404004Z 2025-05-07T20:02:23.7405147Z ptxas /tmp/tmpxft_00008c90_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 889; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:23.7407501Z ptxas /tmp/tmpxft_00008c90_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 896; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:23.7409836Z ptxas /tmp/tmpxft_00008c90_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 903; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:23.7412249Z ptxas /tmp/tmpxft_00008c90_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 910; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:23.7414598Z ptxas /tmp/tmpxft_00008c90_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 1044; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:23.7416955Z ptxas /tmp/tmpxft_00008c90_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 1051; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:23.7419701Z ptxas /tmp/tmpxft_00008c90_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 1058; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:23.7422238Z ptxas /tmp/tmpxft_00008c90_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 1065; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:23.7424401Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.7425575Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:23.7426010Z ^ 2025-05-07T20:02:23.7426274Z detected during: 2025-05-07T20:02:23.7440425Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:23.7467721Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:23.7495687Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:23.7512227Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:02:23.7513351Z 2025-05-07T20:02:23.7513588Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:23.7513937Z 2025-05-07T20:02:23.7514724Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.7515816Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:23.7516246Z ^ 2025-05-07T20:02:23.7516494Z detected during: 2025-05-07T20:02:23.7530923Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:23.7558853Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:23.7587563Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:23.7602998Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:02:23.7604061Z 2025-05-07T20:02:23.7604298Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:23.7604626Z 2025-05-07T20:02:23.7605359Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.7606418Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:23.7606817Z ^ 2025-05-07T20:02:23.7607063Z detected during: 2025-05-07T20:02:23.7621326Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:23.7649064Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:23.7677723Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:23.7693631Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:02:23.7694704Z 2025-05-07T20:02:23.7694928Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:23.7695256Z 2025-05-07T20:03:00.1433208Z [152/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu.o 2025-05-07T20:03:00.1445775Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:03:00.1447315Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:03:00.1448428Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:03:00.1448866Z ^ 2025-05-07T20:03:00.1449029Z 2025-05-07T20:03:00.1449276Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:00.1449623Z 2025-05-07T20:03:00.1450420Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:03:00.1451551Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:03:00.1451972Z ^ 2025-05-07T20:03:00.1452145Z 2025-05-07T20:03:00.1453064Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:00.1454437Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:00.1454900Z ^ 2025-05-07T20:03:00.1455129Z 2025-05-07T20:03:00.1456042Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:00.1457577Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:00.1458060Z ^ 2025-05-07T20:03:00.1458309Z 2025-05-07T20:03:00.1459242Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:00.1460539Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:00.1461005Z ^ 2025-05-07T20:03:00.1461224Z 2025-05-07T20:03:00.1461484Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:00.1461836Z 2025-05-07T20:03:00.1462764Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:00.1464057Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:00.1464543Z ^ 2025-05-07T20:03:00.1464779Z 2025-05-07T20:03:00.1465720Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:00.1467021Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:00.1467706Z ^ 2025-05-07T20:03:00.1467923Z 2025-05-07T20:03:00.1468165Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:00.1468534Z 2025-05-07T20:03:00.1469520Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:00.1470811Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:00.1471294Z ^ 2025-05-07T20:03:00.1471527Z 2025-05-07T20:03:00.1472474Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:00.1473761Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:00.1474239Z ^ 2025-05-07T20:03:00.1474453Z 2025-05-07T20:03:00.1474706Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:00.1475055Z 2025-05-07T20:03:00.1475977Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:00.1477311Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:00.1477795Z ^ 2025-05-07T20:03:00.1478031Z 2025-05-07T20:03:00.1478998Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:00.1480383Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:00.1480837Z ^ 2025-05-07T20:03:00.1481059Z 2025-05-07T20:03:00.1481290Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:00.1481628Z 2025-05-07T20:03:00.1482545Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:00.1483785Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:00.1484259Z ^ 2025-05-07T20:03:00.1484485Z 2025-05-07T20:03:00.1485407Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:00.1486659Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:00.1487127Z ^ 2025-05-07T20:03:00.1487336Z 2025-05-07T20:03:00.1487582Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:00.1487921Z 2025-05-07T20:03:00.1488821Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:00.1490153Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:00.1490618Z ^ 2025-05-07T20:03:00.1490858Z 2025-05-07T20:03:00.1491808Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:00.1493070Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:00.1493521Z ^ 2025-05-07T20:03:00.1493742Z 2025-05-07T20:03:00.1493972Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:00.1494312Z 2025-05-07T20:03:00.1495228Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:00.1496470Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:00.1496943Z ^ 2025-05-07T20:03:00.1497231Z 2025-05-07T20:03:03.0229287Z [153/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu.o 2025-05-07T20:03:03.0240975Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:03:03.0242553Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:03:03.0243627Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:03:03.0244042Z ^ 2025-05-07T20:03:03.0244219Z 2025-05-07T20:03:03.0244445Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:03.0244775Z 2025-05-07T20:03:03.0245548Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:03:03.0246615Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:03:03.0247029Z ^ 2025-05-07T20:03:03.0247185Z 2025-05-07T20:03:03.0248057Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:03.0249304Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:03.0249806Z ^ 2025-05-07T20:03:03.0250044Z 2025-05-07T20:03:03.0250909Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:03.0252142Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:03.0252646Z ^ 2025-05-07T20:03:03.0252908Z 2025-05-07T20:03:03.0253781Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:03.0255029Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:03.0255497Z ^ 2025-05-07T20:03:03.0255713Z 2025-05-07T20:03:03.0255977Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:03.0256321Z 2025-05-07T20:03:03.0257255Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:03.0258746Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:03.0259281Z ^ 2025-05-07T20:03:03.0259531Z 2025-05-07T20:03:03.0260481Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:03.0261825Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:03.0262328Z ^ 2025-05-07T20:03:03.0262591Z 2025-05-07T20:03:03.0262834Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:03.0263202Z 2025-05-07T20:03:03.0264230Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:03.0265426Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:03.0265915Z ^ 2025-05-07T20:03:03.0266139Z 2025-05-07T20:03:03.0267016Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:03.0268592Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:03.0269083Z ^ 2025-05-07T20:03:03.0269302Z 2025-05-07T20:03:03.0269561Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:03.0269913Z 2025-05-07T20:03:03.0270842Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:03.0272144Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:03.0272698Z ^ 2025-05-07T20:03:03.0272933Z 2025-05-07T20:03:03.0273871Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:03.0275239Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:03.0275712Z ^ 2025-05-07T20:03:03.0275969Z 2025-05-07T20:03:03.0276220Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:03.0276585Z 2025-05-07T20:03:03.0277542Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:03.0278845Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:03.0279373Z ^ 2025-05-07T20:03:03.0279625Z 2025-05-07T20:03:03.0280764Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:03.0281975Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:03.0282458Z ^ 2025-05-07T20:03:03.0282668Z 2025-05-07T20:03:03.0282927Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:03.0283263Z 2025-05-07T20:03:03.0284132Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:03.0285360Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:03.0285865Z ^ 2025-05-07T20:03:03.0286116Z 2025-05-07T20:03:03.0287026Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:03.0288226Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:03.0288659Z ^ 2025-05-07T20:03:03.0288876Z 2025-05-07T20:03:03.0289096Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:03.0289418Z 2025-05-07T20:03:03.0290291Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:03.0291468Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:03.0291926Z ^ 2025-05-07T20:03:03.0292142Z 2025-05-07T20:03:03.6637702Z [154/156] : && /github/home/miniconda/envs/build_binary/bin/c++ -fPIC -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/github/home/miniconda/envs/build_binary/lib -Wl,-rpath-link,/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs -s -shared -Wl,-soname,fbgemm_gpu_experimental_gen_ai.so -o experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/attention.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/gqa_attn_splitk.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_lite.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/mixed_dtype_utils.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16_fast_gemv.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/include/fast_gemv.cu.o -L/lib/intel64 -L/lib/intel64_win -L/lib/win-x64 -Wl,-rpath,/lib/intel64:/lib/intel64_win:/lib/win-x64:/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib:/github/home/miniconda/envs/build_binary/lib/stubs: /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libtorch.so /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/nvidia/nccl/lib/libnccl.so.2 /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so" -Wl,--as-needed -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so" -Wl,--as-needed /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libcudart.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/torch/lib/libtorch.so" -Wl,--as-needed -lcudadevrt -lcudart_static -ldl && : 2025-05-07T20:03:03.9231008Z [155/156] cd /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-build/experimental/gen_ai && bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/../.github/scripts/fbgemm_gpu_postbuild.bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:03.9232481Z ################################################################################ 2025-05-07T20:03:03.9232855Z [CMAKE] Running post-build script ... 2025-05-07T20:03:03.9233575Z Target file: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:03.9234514Z Removing all RPATHs ... 2025-05-07T20:03:03.9234916Z ################################################################################ 2025-05-07T20:03:03.9235870Z [155/156] cd /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-build && /github/home/miniconda/envs/build_binary/lib/python3.9/site-packages/cmake/data/bin/cmake -P cmake_install.cmake 2025-05-07T20:03:04.0005478Z -- Install configuration: "Release" 2025-05-07T20:03:04.0037022Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/asmjit.so 2025-05-07T20:03:04.0088983Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/fbgemm.so 2025-05-07T20:03:04.0136115Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:04.0158195Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/bench 2025-05-07T20:03:04.0182081Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/bench/__init__.py 2025-05-07T20:03:04.0187627Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/bench/ck_bf16_bench.py 2025-05-07T20:03:04.0195125Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/bench/comm_bench.py 2025-05-07T20:03:04.0198281Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/bench/gather_scatter_bench.py 2025-05-07T20:03:04.0201327Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/bench/quantize_bench.py 2025-05-07T20:03:04.0202370Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/bench/quantize_ops.py 2025-05-07T20:03:04.0212096Z -- Up-to-date: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:04.0217272Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:04.0250957Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/README.md 2025-05-07T20:03:04.0252347Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/__init__.py 2025-05-07T20:03:04.0253501Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/activation.py 2025-05-07T20:03:04.0254604Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/gather_scatter.py 2025-05-07T20:03:04.0255635Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/layers.py 2025-05-07T20:03:04.0256677Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/shuffling.py 2025-05-07T20:03:04.0258065Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gen_ai/__init__.py 2025-05-07T20:03:04.0259683Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gen_ai/quantize.py 2025-05-07T20:03:04.0305402Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:04.0330338Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/example/__init__.py 2025-05-07T20:03:04.0336702Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/example/utils.py 2025-05-07T20:03:04.0387634Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/__init__.py 2025-05-07T20:03:04.0391334Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/fp8_gemm.py 2025-05-07T20:03:04.0392496Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/grouped_gemm.py 2025-05-07T20:03:04.0393847Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/matmul_perf_model.py 2025-05-07T20:03:04.0394966Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/utils.py 2025-05-07T20:03:04.0623214Z 2025-05-07T20:03:04.3365508Z 2025-05-07T20:03:04.3371865Z copying fbgemm_gpu/__init__.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/__init__.py 2025-05-07T20:03:04.3478334Z copying fbgemm_gpu/batched_unary_embeddings_ops.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/batched_unary_embeddings_ops.py 2025-05-07T20:03:04.3482412Z copying fbgemm_gpu/enums.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/enums.py 2025-05-07T20:03:04.3487162Z copying fbgemm_gpu/metrics.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/metrics.py 2025-05-07T20:03:04.3493108Z copying fbgemm_gpu/permute_pooled_embedding_modules.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules.py 2025-05-07T20:03:04.3499330Z copying fbgemm_gpu/permute_pooled_embedding_modules_split.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules_split.py 2025-05-07T20:03:04.3503093Z copying fbgemm_gpu/quantize_comm.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/quantize_comm.py 2025-05-07T20:03:04.3510914Z copying fbgemm_gpu/quantize_utils.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/quantize_utils.py 2025-05-07T20:03:04.3516222Z copying fbgemm_gpu/runtime_monitor.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/runtime_monitor.py 2025-05-07T20:03:04.3520311Z copying fbgemm_gpu/sparse_ops.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sparse_ops.py 2025-05-07T20:03:04.3531607Z copying fbgemm_gpu/split_embedding_configs.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/split_embedding_configs.py 2025-05-07T20:03:04.3539552Z copying fbgemm_gpu/split_embedding_inference_converter.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/split_embedding_inference_converter.py 2025-05-07T20:03:04.3550075Z copying fbgemm_gpu/split_embedding_optimizer_ops.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/split_embedding_optimizer_ops.py 2025-05-07T20:03:04.3554327Z copying fbgemm_gpu/split_embedding_utils.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/split_embedding_utils.py 2025-05-07T20:03:04.3559009Z copying fbgemm_gpu/split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops.py 2025-05-07T20:03:04.3563788Z copying fbgemm_gpu/split_table_batched_embeddings_ops_common.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_common.py 2025-05-07T20:03:04.3569100Z copying fbgemm_gpu/split_table_batched_embeddings_ops_inference.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_inference.py 2025-05-07T20:03:04.3581358Z copying fbgemm_gpu/split_table_batched_embeddings_ops_training.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training.py 2025-05-07T20:03:04.3604481Z copying fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py 2025-05-07T20:03:04.3610920Z copying fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py 2025-05-07T20:03:04.3614860Z copying fbgemm_gpu/tbe_input_multiplexer.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe_input_multiplexer.py 2025-05-07T20:03:04.3618578Z copying fbgemm_gpu/uvm.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/uvm.py 2025-05-07T20:03:04.3629324Z creating directory _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/config 2025-05-07T20:03:04.3669224Z copying fbgemm_gpu/config/__init__.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/config/__init__.py 2025-05-07T20:03:04.3671162Z copying fbgemm_gpu/config/feature_list.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/config/feature_list.py 2025-05-07T20:03:04.3674653Z creating directory _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/docs 2025-05-07T20:03:04.3697700Z copying fbgemm_gpu/docs/__init__.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/docs/__init__.py 2025-05-07T20:03:04.3698728Z copying fbgemm_gpu/docs/common.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/docs/common.py 2025-05-07T20:03:04.3702808Z copying fbgemm_gpu/docs/examples.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/docs/examples.py 2025-05-07T20:03:04.3710117Z copying fbgemm_gpu/docs/jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/docs/jagged_tensor_ops.py 2025-05-07T20:03:04.3714684Z copying fbgemm_gpu/docs/merge_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/docs/merge_pooled_embedding_ops.py 2025-05-07T20:03:04.3718803Z copying fbgemm_gpu/docs/permute_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/docs/permute_pooled_embedding_ops.py 2025-05-07T20:03:04.3723006Z copying fbgemm_gpu/docs/quantize_ops.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/docs/quantize_ops.py 2025-05-07T20:03:04.3727814Z copying fbgemm_gpu/docs/sparse_ops.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/docs/sparse_ops.py 2025-05-07T20:03:04.3761916Z copying fbgemm_gpu/docs/version.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/docs/version.py 2025-05-07T20:03:04.3762999Z creating directory _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/quantize 2025-05-07T20:03:04.3763767Z copying fbgemm_gpu/quantize/__init__.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/quantize/__init__.py 2025-05-07T20:03:04.3776711Z copying fbgemm_gpu/quantize/quantize_ops.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/quantize/quantize_ops.py 2025-05-07T20:03:04.3780606Z creating directory _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll 2025-05-07T20:03:04.3796800Z copying fbgemm_gpu/sll/__init__.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/__init__.py 2025-05-07T20:03:04.3799299Z creating directory _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe 2025-05-07T20:03:04.3820060Z copying fbgemm_gpu/tbe/__init__.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/__init__.py 2025-05-07T20:03:04.3823988Z creating directory _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/triton 2025-05-07T20:03:04.3824723Z copying fbgemm_gpu/triton/__init__.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/triton/__init__.py 2025-05-07T20:03:04.3830849Z copying fbgemm_gpu/triton/common.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/triton/common.py 2025-05-07T20:03:04.3836855Z copying fbgemm_gpu/triton/quantize.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/triton/quantize.py 2025-05-07T20:03:04.3853766Z copying fbgemm_gpu/triton/quantize_ref.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/triton/quantize_ref.py 2025-05-07T20:03:04.3855181Z creating directory _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/utils 2025-05-07T20:03:04.3855906Z copying fbgemm_gpu/utils/__init__.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/utils/__init__.py 2025-05-07T20:03:04.3860380Z copying fbgemm_gpu/utils/filestore.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/utils/filestore.py 2025-05-07T20:03:04.3865941Z copying fbgemm_gpu/utils/loader.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/utils/loader.py 2025-05-07T20:03:04.3869785Z copying fbgemm_gpu/utils/torch_library.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/utils/torch_library.py 2025-05-07T20:03:04.3876211Z creating directory _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/cpu 2025-05-07T20:03:04.3878363Z copying fbgemm_gpu/sll/cpu/__init__.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/cpu/__init__.py 2025-05-07T20:03:04.3882203Z copying fbgemm_gpu/sll/cpu/cpu_sll.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/cpu/cpu_sll.py 2025-05-07T20:03:04.3889132Z creating directory _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/meta 2025-05-07T20:03:04.3889884Z copying fbgemm_gpu/sll/meta/__init__.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/meta/__init__.py 2025-05-07T20:03:04.3893997Z copying fbgemm_gpu/sll/meta/meta_sll.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/meta/meta_sll.py 2025-05-07T20:03:04.3901067Z creating directory _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton 2025-05-07T20:03:04.3903313Z copying fbgemm_gpu/sll/triton/__init__.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/__init__.py 2025-05-07T20:03:04.3909969Z copying fbgemm_gpu/sll/triton/common.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/common.py 2025-05-07T20:03:04.3916757Z copying fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py 2025-05-07T20:03:04.3921133Z copying fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py 2025-05-07T20:03:04.3924998Z copying fbgemm_gpu/sll/triton/triton_jagged_bmm.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm.py 2025-05-07T20:03:04.3930104Z copying fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py 2025-05-07T20:03:04.3943097Z copying fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py 2025-05-07T20:03:04.3949845Z copying fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py 2025-05-07T20:03:04.3953519Z copying fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py 2025-05-07T20:03:04.3962452Z copying fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py 2025-05-07T20:03:04.3970353Z copying fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py 2025-05-07T20:03:04.3974651Z copying fbgemm_gpu/sll/triton/triton_jagged_softmax.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_softmax.py 2025-05-07T20:03:04.3985198Z copying fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py 2025-05-07T20:03:04.3991117Z creating directory _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench 2025-05-07T20:03:04.3991887Z copying fbgemm_gpu/tbe/bench/__init__.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/__init__.py 2025-05-07T20:03:04.3996081Z copying fbgemm_gpu/tbe/bench/bench_config.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/bench_config.py 2025-05-07T20:03:04.4003625Z copying fbgemm_gpu/tbe/bench/bench_runs.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/bench_runs.py 2025-05-07T20:03:04.4010491Z copying fbgemm_gpu/tbe/bench/eeg_cli.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/eeg_cli.py 2025-05-07T20:03:04.4014137Z copying fbgemm_gpu/tbe/bench/embedding_ops_common_config.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/embedding_ops_common_config.py 2025-05-07T20:03:04.4021268Z copying fbgemm_gpu/tbe/bench/eval_compression.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/eval_compression.py 2025-05-07T20:03:04.4026933Z copying fbgemm_gpu/tbe/bench/reporter.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/reporter.py 2025-05-07T20:03:04.4031556Z copying fbgemm_gpu/tbe/bench/tbe_data_config.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config.py 2025-05-07T20:03:04.4037419Z copying fbgemm_gpu/tbe/bench/tbe_data_config_loader.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_loader.py 2025-05-07T20:03:04.4044142Z copying fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py 2025-05-07T20:03:04.4048750Z copying fbgemm_gpu/tbe/bench/utils.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/utils.py 2025-05-07T20:03:04.4052616Z creating directory _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/cache 2025-05-07T20:03:04.4053370Z copying fbgemm_gpu/tbe/cache/__init__.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/cache/__init__.py 2025-05-07T20:03:04.4069237Z copying fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py 2025-05-07T20:03:04.4072270Z creating directory _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:04.4072992Z copying fbgemm_gpu/tbe/ssd/__init__.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/ssd/__init__.py 2025-05-07T20:03:04.4079907Z copying fbgemm_gpu/tbe/ssd/common.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/ssd/common.py 2025-05-07T20:03:04.4087132Z copying fbgemm_gpu/tbe/ssd/inference.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/ssd/inference.py 2025-05-07T20:03:04.4097337Z copying fbgemm_gpu/tbe/ssd/training.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/ssd/training.py 2025-05-07T20:03:04.4108113Z creating directory _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/utils 2025-05-07T20:03:04.4109959Z copying fbgemm_gpu/tbe/utils/__init__.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/utils/__init__.py 2025-05-07T20:03:04.4117081Z copying fbgemm_gpu/tbe/utils/common.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/utils/common.py 2025-05-07T20:03:04.4121139Z copying fbgemm_gpu/tbe/utils/offsets.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/utils/offsets.py 2025-05-07T20:03:04.4127795Z copying fbgemm_gpu/tbe/utils/quantize.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/utils/quantize.py 2025-05-07T20:03:04.4132319Z copying fbgemm_gpu/tbe/utils/requests.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/utils/requests.py 2025-05-07T20:03:04.4139790Z creating directory _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/stats 2025-05-07T20:03:04.4140545Z copying fbgemm_gpu/tbe/stats/__init__.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/stats/__init__.py 2025-05-07T20:03:04.4144988Z copying fbgemm_gpu/tbe/stats/bench_params_reporter.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/stats/bench_params_reporter.py 2025-05-07T20:03:04.4149439Z creating directory _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:04.4150235Z copying fbgemm_gpu/tbe/ssd/utils/__init__.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/ssd/utils/__init__.py 2025-05-07T20:03:04.4161453Z copying fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py 2025-05-07T20:03:04.4166734Z creating directory _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/triton/jagged 2025-05-07T20:03:04.4167645Z copying fbgemm_gpu/triton/jagged/__init__.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/triton/jagged/__init__.py 2025-05-07T20:03:04.4172174Z copying fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py 2025-05-07T20:03:04.4251875Z 2025-05-07T20:03:04.9015045Z INFO:root:running bdist_wheel 2025-05-07T20:03:05.0301971Z INFO:root:running build 2025-05-07T20:03:05.0319205Z INFO:root:running build_py 2025-05-07T20:03:05.0530377Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.0556210Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.0560646Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/batched_unary_embeddings_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.0562243Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/enums.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.0563937Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/metrics.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.0565856Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.0567815Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules_split.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.0569536Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/quantize_comm.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.0594924Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/quantize_utils.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.0596491Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/runtime_monitor.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.0597858Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sparse_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.0599185Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/split_embedding_configs.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.0600593Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/split_embedding_inference_converter.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.0602055Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/split_embedding_optimizer_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.0603447Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/split_embedding_utils.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.0604834Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.0606310Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_common.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.0608561Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_inference.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.0611210Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.0614737Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.0616296Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.0618026Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe_input_multiplexer.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.0619831Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/uvm.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.0622374Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/config 2025-05-07T20:03:05.0623721Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/config/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/config 2025-05-07T20:03:05.0625650Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/config/feature_list.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/config 2025-05-07T20:03:05.0628582Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/docs 2025-05-07T20:03:05.0651503Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/docs/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/docs 2025-05-07T20:03:05.0652874Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/docs/common.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/docs 2025-05-07T20:03:05.0654361Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/docs/examples.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/docs 2025-05-07T20:03:05.0655785Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/docs/jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/docs 2025-05-07T20:03:05.0657352Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/docs/merge_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/docs 2025-05-07T20:03:05.0658860Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/docs/permute_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/docs 2025-05-07T20:03:05.0660309Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/docs/quantize_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/docs 2025-05-07T20:03:05.0661651Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/docs/sparse_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/docs 2025-05-07T20:03:05.0679650Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/docs/version.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/docs 2025-05-07T20:03:05.0681876Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/quantize 2025-05-07T20:03:05.0683143Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/quantize/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/quantize 2025-05-07T20:03:05.0684937Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/quantize/quantize_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/quantize 2025-05-07T20:03:05.0687257Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll 2025-05-07T20:03:05.0688588Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll 2025-05-07T20:03:05.0690830Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe 2025-05-07T20:03:05.0692120Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe 2025-05-07T20:03:05.0694600Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/triton 2025-05-07T20:03:05.0695874Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/triton/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/triton 2025-05-07T20:03:05.0697712Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/triton/common.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/triton 2025-05-07T20:03:05.0699576Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/triton/quantize.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/triton 2025-05-07T20:03:05.0701512Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/triton/quantize_ref.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/triton 2025-05-07T20:03:05.0704665Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/utils 2025-05-07T20:03:05.0705908Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/utils/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/utils 2025-05-07T20:03:05.0707674Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/utils/filestore.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/utils 2025-05-07T20:03:05.0709380Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/utils/loader.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/utils 2025-05-07T20:03:05.0711084Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/utils/torch_library.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/utils 2025-05-07T20:03:05.0713338Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/cpu 2025-05-07T20:03:05.0714610Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/cpu/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/cpu 2025-05-07T20:03:05.0716338Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/cpu/cpu_sll.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/cpu 2025-05-07T20:03:05.0718909Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/meta 2025-05-07T20:03:05.0720203Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/meta/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/meta 2025-05-07T20:03:05.0722023Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/meta/meta_sll.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/meta 2025-05-07T20:03:05.0725499Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton 2025-05-07T20:03:05.0726776Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton 2025-05-07T20:03:05.0728538Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/common.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton 2025-05-07T20:03:05.0730362Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton 2025-05-07T20:03:05.0732002Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton 2025-05-07T20:03:05.0733565Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton 2025-05-07T20:03:05.0735119Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton 2025-05-07T20:03:05.0737244Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton 2025-05-07T20:03:05.0739225Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton 2025-05-07T20:03:05.0740951Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton 2025-05-07T20:03:05.0742661Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton 2025-05-07T20:03:05.0744441Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton 2025-05-07T20:03:05.0746131Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_softmax.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton 2025-05-07T20:03:05.0748137Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton 2025-05-07T20:03:05.0758089Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.0759435Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.0761234Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/bench_config.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.0762910Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/bench_runs.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.0764699Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/eeg_cli.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.0766409Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/embedding_ops_common_config.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.0768159Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/eval_compression.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.0770038Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/reporter.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.0771619Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.0773544Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_loader.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.0775269Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.0776825Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/utils.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.0779402Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/cache 2025-05-07T20:03:05.0780620Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/cache/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/cache 2025-05-07T20:03:05.0782433Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/cache 2025-05-07T20:03:05.0784693Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:05.0785950Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/ssd/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:05.0787637Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/ssd/common.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:05.0789247Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/ssd/inference.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:05.0791246Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/ssd/training.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:05.0794856Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/utils 2025-05-07T20:03:05.0796143Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/utils/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/utils 2025-05-07T20:03:05.0797898Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/utils/common.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/utils 2025-05-07T20:03:05.0799561Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/utils/offsets.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/utils 2025-05-07T20:03:05.0801115Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/utils/quantize.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/utils 2025-05-07T20:03:05.0802733Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/utils/requests.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/utils 2025-05-07T20:03:05.0805139Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/stats 2025-05-07T20:03:05.0806397Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/stats/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/stats 2025-05-07T20:03:05.0808198Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/stats/bench_params_reporter.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/stats 2025-05-07T20:03:05.0810494Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:05.0812406Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/ssd/utils/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:05.0814220Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:05.0816460Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/triton/jagged 2025-05-07T20:03:05.0817771Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/triton/jagged/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/triton/jagged 2025-05-07T20:03:05.0819815Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/triton/jagged 2025-05-07T20:03:05.1084065Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/asmjit.so -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.1124754Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/fbgemm.so -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.1379307Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:05.1380678Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:05.5276044Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/bench 2025-05-07T20:03:05.5277576Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/bench/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/bench 2025-05-07T20:03:05.5279243Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/bench/ck_bf16_bench.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/bench 2025-05-07T20:03:05.5284950Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/bench/comm_bench.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/bench 2025-05-07T20:03:05.5290862Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/bench/gather_scatter_bench.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/bench 2025-05-07T20:03:05.5296739Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/bench/quantize_bench.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/bench 2025-05-07T20:03:05.5308354Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/bench/quantize_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/bench 2025-05-07T20:03:05.5327240Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:05.5331055Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/README.md -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:05.5332968Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:05.5349634Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/activation.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:05.5353295Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/gather_scatter.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:05.5369213Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/layers.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:05.5382080Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/shuffling.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:05.5387748Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gen_ai/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:05.5392859Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gen_ai/quantize.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:05.5402174Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/example 2025-05-07T20:03:05.5406240Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/example/fbgemm_gpu_experimental_example_py.so -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/example 2025-05-07T20:03:05.5437258Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/example/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/example 2025-05-07T20:03:05.5439575Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/example/utils.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/example 2025-05-07T20:03:05.5443663Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:05.5445195Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:05.5450580Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/fp8_gemm.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:05.5473363Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/grouped_gemm.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:05.5493561Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/matmul_perf_model.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:05.5500089Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/utils.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:05.5504581Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.5511141Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/batched_unary_embeddings_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.5512751Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/enums.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.5514378Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/metrics.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.5516155Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.5517849Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules_split.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.5519373Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/quantize_comm.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.5521652Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/quantize_utils.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.5523294Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/runtime_monitor.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.5525030Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sparse_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.5527008Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/split_embedding_configs.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.5528730Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/split_embedding_inference_converter.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.5530413Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/split_embedding_optimizer_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.5531905Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/split_embedding_utils.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.5533608Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.5535141Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_common.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.5536801Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_inference.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.5539679Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.5550879Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.5552567Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.5554141Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe_input_multiplexer.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.5555708Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/uvm.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu 2025-05-07T20:03:05.5557418Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/config/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/config 2025-05-07T20:03:05.5559092Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/config/feature_list.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/config 2025-05-07T20:03:05.5560657Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/docs/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/docs 2025-05-07T20:03:05.5562338Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/docs/common.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/docs 2025-05-07T20:03:05.5563925Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/docs/examples.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/docs 2025-05-07T20:03:05.5575569Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/docs/jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/docs 2025-05-07T20:03:05.5580300Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/docs/merge_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/docs 2025-05-07T20:03:05.5582742Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/docs/permute_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/docs 2025-05-07T20:03:05.5584679Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/docs/quantize_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/docs 2025-05-07T20:03:05.5586356Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/docs/sparse_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/docs 2025-05-07T20:03:05.5588364Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/docs/version.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/docs 2025-05-07T20:03:05.5595388Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/quantize/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/quantize 2025-05-07T20:03:05.5598685Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/quantize/quantize_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/quantize 2025-05-07T20:03:05.5600211Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll 2025-05-07T20:03:05.5601969Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe 2025-05-07T20:03:05.5603705Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/triton/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/triton 2025-05-07T20:03:05.5605323Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/triton/common.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/triton 2025-05-07T20:03:05.5606889Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/triton/quantize.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/triton 2025-05-07T20:03:05.5618221Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/triton/quantize_ref.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/triton 2025-05-07T20:03:05.5619937Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/utils/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/utils 2025-05-07T20:03:05.5621693Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/utils/filestore.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/utils 2025-05-07T20:03:05.5623226Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/utils/loader.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/utils 2025-05-07T20:03:05.5624796Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/utils/torch_library.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/utils 2025-05-07T20:03:05.5626915Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/cpu/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/cpu 2025-05-07T20:03:05.5628518Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/cpu/cpu_sll.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/cpu 2025-05-07T20:03:05.5630364Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/meta/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/meta 2025-05-07T20:03:05.5632050Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/meta/meta_sll.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/meta 2025-05-07T20:03:05.5633704Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton 2025-05-07T20:03:05.5635461Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/common.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton 2025-05-07T20:03:05.5637240Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton 2025-05-07T20:03:05.5639207Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton 2025-05-07T20:03:05.5640912Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton 2025-05-07T20:03:05.5642566Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton 2025-05-07T20:03:05.5644245Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton 2025-05-07T20:03:05.5646026Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton 2025-05-07T20:03:05.5647801Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton 2025-05-07T20:03:05.5658441Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton 2025-05-07T20:03:05.5660294Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton 2025-05-07T20:03:05.5662051Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_softmax.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton 2025-05-07T20:03:05.5663976Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton 2025-05-07T20:03:05.5665788Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.5667623Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/bench_config.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.5669449Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/bench_runs.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.5671373Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/eeg_cli.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.5673070Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/embedding_ops_common_config.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.5674717Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/eval_compression.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.5678230Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/reporter.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.5679841Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.5681638Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_loader.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.5683241Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.5684828Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/bench/utils.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.5686427Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/cache/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/cache 2025-05-07T20:03:05.5688138Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/cache 2025-05-07T20:03:05.5689664Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/ssd/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:05.5691266Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/ssd/common.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:05.5692872Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/ssd/inference.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:05.5694649Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/ssd/training.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:05.5697603Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/utils/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/utils 2025-05-07T20:03:05.5699327Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/utils/common.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/utils 2025-05-07T20:03:05.5700968Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/utils/offsets.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/utils 2025-05-07T20:03:05.5702583Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/utils/quantize.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/utils 2025-05-07T20:03:05.5704227Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/utils/requests.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/utils 2025-05-07T20:03:05.5706065Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/stats/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/stats 2025-05-07T20:03:05.5707867Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/stats/bench_params_reporter.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/stats 2025-05-07T20:03:05.5709506Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/ssd/utils/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:05.5711373Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:05.5713119Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/triton/jagged/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/triton/jagged 2025-05-07T20:03:05.5714771Z INFO:root:copying _skbuild/linux-x86_64-3.9/cmake-install/fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/triton/jagged 2025-05-07T20:03:05.5757872Z INFO:skbuild:copied 90 files 2025-05-07T20:03:05.5758180Z INFO:root:running build_ext 2025-05-07T20:03:05.6082117Z INFO:root:installing to _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel 2025-05-07T20:03:05.6082725Z INFO:root:running install 2025-05-07T20:03:05.6406432Z INFO:root:running install_lib 2025-05-07T20:03:05.6439100Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel 2025-05-07T20:03:05.6447971Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu 2025-05-07T20:03:05.6449162Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/config 2025-05-07T20:03:05.6450461Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/config/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/config 2025-05-07T20:03:05.6452125Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/config/feature_list.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/config 2025-05-07T20:03:05.6453399Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/docs 2025-05-07T20:03:05.6454721Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/docs/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:05.6461447Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/docs/common.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:05.6463132Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/docs/examples.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:05.6468237Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/docs/jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:05.6469934Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/docs/merge_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:05.6471775Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/docs/permute_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:05.6473454Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/docs/quantize_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:05.6475094Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/docs/sparse_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:05.6476690Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/docs/version.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:05.6477881Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/quantize 2025-05-07T20:03:05.6479143Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/quantize/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/quantize 2025-05-07T20:03:05.6480898Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/quantize/quantize_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/quantize 2025-05-07T20:03:05.6482176Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/sll 2025-05-07T20:03:05.6482918Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/sll/cpu 2025-05-07T20:03:05.6484153Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/cpu/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/cpu 2025-05-07T20:03:05.6485759Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/cpu/cpu_sll.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/cpu 2025-05-07T20:03:05.6486956Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/sll/meta 2025-05-07T20:03:05.6488203Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/meta/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/meta 2025-05-07T20:03:05.6489869Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/meta/meta_sll.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/meta 2025-05-07T20:03:05.6491088Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/sll/triton 2025-05-07T20:03:05.6492415Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:05.6494139Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton/common.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:05.6495941Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:05.6497998Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:05.6499834Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton/triton_jagged_bmm.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:05.6501705Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:05.6503676Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:05.6505675Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:05.6507751Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:05.6509760Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:05.6511858Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:05.6513839Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton/triton_jagged_softmax.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:05.6515710Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:05.6517538Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sll/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll 2025-05-07T20:03:05.6518743Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe 2025-05-07T20:03:05.6519530Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.6520891Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.6522594Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench/bench_config.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.6524341Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench/bench_runs.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.6526073Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench/eeg_cli.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.6527894Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench/embedding_ops_common_config.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.6529708Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench/eval_compression.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.6531445Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench/reporter.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.6533172Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench/tbe_data_config.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.6534941Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench/tbe_data_config_loader.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.6536810Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.6538682Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/bench/utils.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:05.6539942Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/cache 2025-05-07T20:03:05.6541237Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/cache/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/cache 2025-05-07T20:03:05.6542982Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/cache 2025-05-07T20:03:05.6544258Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:05.6545092Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:05.6546445Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/ssd/utils/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:05.6548294Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:05.6550120Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/ssd/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd 2025-05-07T20:03:05.6551757Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/ssd/common.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd 2025-05-07T20:03:05.6553448Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/ssd/inference.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd 2025-05-07T20:03:05.6555131Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/ssd/training.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd 2025-05-07T20:03:05.6556483Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/utils 2025-05-07T20:03:05.6557743Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/utils/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/utils 2025-05-07T20:03:05.6559424Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/utils/common.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/utils 2025-05-07T20:03:05.6561323Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/utils/offsets.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/utils 2025-05-07T20:03:05.6563043Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/utils/quantize.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/utils 2025-05-07T20:03:05.6564728Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/utils/requests.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/utils 2025-05-07T20:03:05.6565930Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/stats 2025-05-07T20:03:05.6567106Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/stats/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/stats 2025-05-07T20:03:05.6568943Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/stats/bench_params_reporter.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/stats 2025-05-07T20:03:05.6570636Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe 2025-05-07T20:03:05.6571781Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/triton 2025-05-07T20:03:05.6572574Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/triton/jagged 2025-05-07T20:03:05.6573851Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/triton/jagged/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton/jagged 2025-05-07T20:03:05.6575620Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton/jagged 2025-05-07T20:03:05.6577421Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/triton/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton 2025-05-07T20:03:05.6579015Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/triton/common.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton 2025-05-07T20:03:05.6580603Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/triton/quantize.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton 2025-05-07T20:03:05.6582270Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/triton/quantize_ref.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton 2025-05-07T20:03:05.6583477Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/utils 2025-05-07T20:03:05.6584655Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/utils/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/utils 2025-05-07T20:03:05.6586211Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/utils/filestore.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/utils 2025-05-07T20:03:05.6587786Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/utils/loader.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/utils 2025-05-07T20:03:05.6589457Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/utils/torch_library.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/utils 2025-05-07T20:03:05.6590947Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/asmjit.so -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:05.6592364Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/fbgemm.so -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:05.6609013Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental 2025-05-07T20:03:05.6609921Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:05.6611370Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:05.7201063Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:05.7202962Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gen_ai/moe/README.md -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:05.7205094Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gen_ai/moe/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:05.7207036Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gen_ai/moe/activation.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:05.7209091Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gen_ai/moe/gather_scatter.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:05.7211041Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gen_ai/moe/layers.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:05.7212998Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gen_ai/moe/shuffling.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:05.7214895Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gen_ai/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:05.7216867Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gen_ai/quantize.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:05.7218303Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/bench 2025-05-07T20:03:05.7219740Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/bench/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:03:05.7221567Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/bench/ck_bf16_bench.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:03:05.7223521Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/bench/comm_bench.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:03:05.7225474Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/bench/gather_scatter_bench.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:03:05.7227464Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/bench/quantize_bench.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:03:05.7229390Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/bench/quantize_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:03:05.7230798Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/example 2025-05-07T20:03:05.7232354Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/example/fbgemm_gpu_experimental_example_py.so -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/example 2025-05-07T20:03:05.7234483Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/example/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/example 2025-05-07T20:03:05.7236419Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/example/utils.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/example 2025-05-07T20:03:05.7237847Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/gemm 2025-05-07T20:03:05.7238852Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:05.7240368Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gemm/triton_gemm/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:05.7242371Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gemm/triton_gemm/fp8_gemm.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:05.7244398Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gemm/triton_gemm/grouped_gemm.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:05.7246463Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gemm/triton_gemm/matmul_perf_model.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:05.7248536Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/experimental/gemm/triton_gemm/utils.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:05.7250645Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/__init__.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:05.7252221Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/batched_unary_embeddings_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:05.7253825Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/enums.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:05.7255375Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/metrics.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:05.7257016Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/permute_pooled_embedding_modules.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:05.7258840Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/permute_pooled_embedding_modules_split.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:05.7260516Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/quantize_comm.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:05.7262057Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/quantize_utils.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:05.7263716Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/runtime_monitor.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:05.7265317Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/sparse_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:05.7266926Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/split_embedding_configs.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:05.7268823Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/split_embedding_inference_converter.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:05.7270509Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/split_embedding_optimizer_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:05.7272104Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/split_embedding_utils.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:05.7273788Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:05.7275517Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/split_table_batched_embeddings_ops_common.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:05.7277314Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/split_table_batched_embeddings_ops_inference.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:05.7279136Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/split_table_batched_embeddings_ops_training.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:05.7280939Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:05.7282727Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:05.7284350Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/tbe_input_multiplexer.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:05.7285978Z INFO:root:copying _skbuild/linux-x86_64-3.9/setuptools/lib.linux-x86_64-cpython-39/fbgemm_gpu/uvm.py -> _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:05.7286834Z INFO:skbuild:copied 115 files 2025-05-07T20:03:05.7287178Z INFO:root:running install_egg_info 2025-05-07T20:03:05.7580362Z INFO:root:running egg_info 2025-05-07T20:03:05.7618151Z INFO:root:creating fbgemm_gpu_genai_nightly.egg-info 2025-05-07T20:03:05.7650099Z INFO:root:writing fbgemm_gpu_genai_nightly.egg-info/PKG-INFO 2025-05-07T20:03:05.7747769Z INFO:root:writing dependency_links to fbgemm_gpu_genai_nightly.egg-info/dependency_links.txt 2025-05-07T20:03:05.7770567Z INFO:root:writing requirements to fbgemm_gpu_genai_nightly.egg-info/requires.txt 2025-05-07T20:03:05.7771307Z INFO:root:writing top-level names to fbgemm_gpu_genai_nightly.egg-info/top_level.txt 2025-05-07T20:03:05.7806882Z INFO:root:writing manifest file 'fbgemm_gpu_genai_nightly.egg-info/SOURCES.txt' 2025-05-07T20:03:05.7995002Z INFO:root:reading manifest file 'fbgemm_gpu_genai_nightly.egg-info/SOURCES.txt' 2025-05-07T20:03:05.8034794Z INFO:root:writing manifest file 'fbgemm_gpu_genai_nightly.egg-info/SOURCES.txt' 2025-05-07T20:03:05.8037336Z INFO:root:Copying fbgemm_gpu_genai_nightly.egg-info to _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu_genai_nightly-2025.5.7-py3.9.egg-info 2025-05-07T20:03:05.8066518Z INFO:root:running install_scripts 2025-05-07T20:03:05.8075886Z INFO:skbuild:copied 0 files 2025-05-07T20:03:12.9945696Z INFO:root:creating _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu_genai_nightly-2025.5.7.dist-info/WHEEL 2025-05-07T20:03:13.0210469Z INFO:wheel:creating '/__w/FBGEMM/FBGEMM/fbgemm_gpu/dist/.tmp-sk97hdnz/fbgemm_gpu_genai_nightly-2025.5.7-cp39-cp39-manylinux_2_28_x86_64.whl' and adding '_skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel' to it 2025-05-07T20:03:13.0335961Z INFO:wheel:adding 'fbgemm_gpu/__init__.py' 2025-05-07T20:03:13.0893904Z INFO:wheel:adding 'fbgemm_gpu/asmjit.so' 2025-05-07T20:03:13.0916610Z INFO:wheel:adding 'fbgemm_gpu/batched_unary_embeddings_ops.py' 2025-05-07T20:03:13.0917125Z INFO:wheel:adding 'fbgemm_gpu/enums.py' 2025-05-07T20:03:13.2932202Z INFO:wheel:adding 'fbgemm_gpu/fbgemm.so' 2025-05-07T20:03:13.3052890Z INFO:wheel:adding 'fbgemm_gpu/metrics.py' 2025-05-07T20:03:13.3053477Z INFO:wheel:adding 'fbgemm_gpu/permute_pooled_embedding_modules.py' 2025-05-07T20:03:13.3054196Z INFO:wheel:adding 'fbgemm_gpu/permute_pooled_embedding_modules_split.py' 2025-05-07T20:03:13.3057964Z INFO:wheel:adding 'fbgemm_gpu/quantize_comm.py' 2025-05-07T20:03:13.3060982Z INFO:wheel:adding 'fbgemm_gpu/quantize_utils.py' 2025-05-07T20:03:13.3064166Z INFO:wheel:adding 'fbgemm_gpu/runtime_monitor.py' 2025-05-07T20:03:13.3075338Z INFO:wheel:adding 'fbgemm_gpu/sparse_ops.py' 2025-05-07T20:03:13.3079178Z INFO:wheel:adding 'fbgemm_gpu/split_embedding_configs.py' 2025-05-07T20:03:13.3082164Z INFO:wheel:adding 'fbgemm_gpu/split_embedding_inference_converter.py' 2025-05-07T20:03:13.3083894Z INFO:wheel:adding 'fbgemm_gpu/split_embedding_optimizer_ops.py' 2025-05-07T20:03:13.3085508Z INFO:wheel:adding 'fbgemm_gpu/split_embedding_utils.py' 2025-05-07T20:03:13.3088252Z INFO:wheel:adding 'fbgemm_gpu/split_table_batched_embeddings_ops.py' 2025-05-07T20:03:13.3090840Z INFO:wheel:adding 'fbgemm_gpu/split_table_batched_embeddings_ops_common.py' 2025-05-07T20:03:13.3115365Z INFO:wheel:adding 'fbgemm_gpu/split_table_batched_embeddings_ops_inference.py' 2025-05-07T20:03:13.3157174Z INFO:wheel:adding 'fbgemm_gpu/split_table_batched_embeddings_ops_training.py' 2025-05-07T20:03:13.3159714Z INFO:wheel:adding 'fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py' 2025-05-07T20:03:13.3161515Z INFO:wheel:adding 'fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py' 2025-05-07T20:03:13.3163523Z INFO:wheel:adding 'fbgemm_gpu/tbe_input_multiplexer.py' 2025-05-07T20:03:13.3165174Z INFO:wheel:adding 'fbgemm_gpu/uvm.py' 2025-05-07T20:03:13.3167010Z INFO:wheel:adding 'fbgemm_gpu/config/__init__.py' 2025-05-07T20:03:13.3169251Z INFO:wheel:adding 'fbgemm_gpu/config/feature_list.py' 2025-05-07T20:03:13.3171330Z INFO:wheel:adding 'fbgemm_gpu/docs/__init__.py' 2025-05-07T20:03:13.3172743Z INFO:wheel:adding 'fbgemm_gpu/docs/common.py' 2025-05-07T20:03:13.3174753Z INFO:wheel:adding 'fbgemm_gpu/docs/examples.py' 2025-05-07T20:03:13.3177373Z INFO:wheel:adding 'fbgemm_gpu/docs/jagged_tensor_ops.py' 2025-05-07T20:03:13.3179460Z INFO:wheel:adding 'fbgemm_gpu/docs/merge_pooled_embedding_ops.py' 2025-05-07T20:03:13.3181816Z INFO:wheel:adding 'fbgemm_gpu/docs/permute_pooled_embedding_ops.py' 2025-05-07T20:03:13.3183634Z INFO:wheel:adding 'fbgemm_gpu/docs/quantize_ops.py' 2025-05-07T20:03:13.3189558Z INFO:wheel:adding 'fbgemm_gpu/docs/sparse_ops.py' 2025-05-07T20:03:13.3191601Z INFO:wheel:adding 'fbgemm_gpu/docs/version.py' 2025-05-07T20:03:13.3193676Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/__init__.py' 2025-05-07T20:03:13.3196274Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/ck_bf16_bench.py' 2025-05-07T20:03:13.3199504Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/comm_bench.py' 2025-05-07T20:03:13.3203477Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/gather_scatter_bench.py' 2025-05-07T20:03:13.3209324Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/quantize_bench.py' 2025-05-07T20:03:13.3221459Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/quantize_ops.py' 2025-05-07T20:03:13.3224812Z INFO:wheel:adding 'fbgemm_gpu/experimental/example/__init__.py' 2025-05-07T20:03:13.3370576Z INFO:wheel:adding 'fbgemm_gpu/experimental/example/fbgemm_gpu_experimental_example_py.so' 2025-05-07T20:03:13.3381874Z INFO:wheel:adding 'fbgemm_gpu/experimental/example/utils.py' 2025-05-07T20:03:13.3384035Z INFO:wheel:adding 'fbgemm_gpu/experimental/gemm/triton_gemm/__init__.py' 2025-05-07T20:03:13.3413092Z INFO:wheel:adding 'fbgemm_gpu/experimental/gemm/triton_gemm/fp8_gemm.py' 2025-05-07T20:03:13.3419729Z INFO:wheel:adding 'fbgemm_gpu/experimental/gemm/triton_gemm/grouped_gemm.py' 2025-05-07T20:03:13.3423861Z INFO:wheel:adding 'fbgemm_gpu/experimental/gemm/triton_gemm/matmul_perf_model.py' 2025-05-07T20:03:13.3426172Z INFO:wheel:adding 'fbgemm_gpu/experimental/gemm/triton_gemm/utils.py' 2025-05-07T20:03:13.3428260Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/__init__.py' 2025-05-07T20:03:15.2916361Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so' 2025-05-07T20:03:15.4917240Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/quantize.py' 2025-05-07T20:03:15.4917803Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/README.md' 2025-05-07T20:03:15.4918362Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/__init__.py' 2025-05-07T20:03:15.4920489Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/activation.py' 2025-05-07T20:03:15.4925096Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/gather_scatter.py' 2025-05-07T20:03:15.4934735Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/layers.py' 2025-05-07T20:03:15.4939474Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/shuffling.py' 2025-05-07T20:03:15.4941718Z INFO:wheel:adding 'fbgemm_gpu/quantize/__init__.py' 2025-05-07T20:03:15.4943708Z INFO:wheel:adding 'fbgemm_gpu/quantize/quantize_ops.py' 2025-05-07T20:03:15.4946037Z INFO:wheel:adding 'fbgemm_gpu/sll/__init__.py' 2025-05-07T20:03:15.4948172Z INFO:wheel:adding 'fbgemm_gpu/sll/cpu/__init__.py' 2025-05-07T20:03:15.4954611Z INFO:wheel:adding 'fbgemm_gpu/sll/cpu/cpu_sll.py' 2025-05-07T20:03:15.4957222Z INFO:wheel:adding 'fbgemm_gpu/sll/meta/__init__.py' 2025-05-07T20:03:15.4959903Z INFO:wheel:adding 'fbgemm_gpu/sll/meta/meta_sll.py' 2025-05-07T20:03:15.4962596Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/__init__.py' 2025-05-07T20:03:15.4964352Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/common.py' 2025-05-07T20:03:15.4966350Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py' 2025-05-07T20:03:15.4969251Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py' 2025-05-07T20:03:15.4973102Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_bmm.py' 2025-05-07T20:03:15.4977226Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py' 2025-05-07T20:03:15.4979604Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py' 2025-05-07T20:03:15.4982021Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py' 2025-05-07T20:03:15.4987697Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py' 2025-05-07T20:03:15.4993193Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py' 2025-05-07T20:03:15.4995537Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py' 2025-05-07T20:03:15.4999431Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_softmax.py' 2025-05-07T20:03:15.5004929Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py' 2025-05-07T20:03:15.5007087Z INFO:wheel:adding 'fbgemm_gpu/tbe/__init__.py' 2025-05-07T20:03:15.5009341Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/__init__.py' 2025-05-07T20:03:15.5011609Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/bench_config.py' 2025-05-07T20:03:15.5016623Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/bench_runs.py' 2025-05-07T20:03:15.5019488Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/eeg_cli.py' 2025-05-07T20:03:15.5022090Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/embedding_ops_common_config.py' 2025-05-07T20:03:15.5024026Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/eval_compression.py' 2025-05-07T20:03:15.5025772Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/reporter.py' 2025-05-07T20:03:15.5029140Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/tbe_data_config.py' 2025-05-07T20:03:15.5031994Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/tbe_data_config_loader.py' 2025-05-07T20:03:15.5034680Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py' 2025-05-07T20:03:15.5036546Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/utils.py' 2025-05-07T20:03:15.5038435Z INFO:wheel:adding 'fbgemm_gpu/tbe/cache/__init__.py' 2025-05-07T20:03:15.5040202Z INFO:wheel:adding 'fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py' 2025-05-07T20:03:15.5041991Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/__init__.py' 2025-05-07T20:03:15.5043477Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/common.py' 2025-05-07T20:03:15.5049504Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/inference.py' 2025-05-07T20:03:15.5074582Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/training.py' 2025-05-07T20:03:15.5078714Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/utils/__init__.py' 2025-05-07T20:03:15.5081835Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py' 2025-05-07T20:03:15.5083682Z INFO:wheel:adding 'fbgemm_gpu/tbe/stats/__init__.py' 2025-05-07T20:03:15.5086597Z INFO:wheel:adding 'fbgemm_gpu/tbe/stats/bench_params_reporter.py' 2025-05-07T20:03:15.5088626Z INFO:wheel:adding 'fbgemm_gpu/tbe/utils/__init__.py' 2025-05-07T20:03:15.5090292Z INFO:wheel:adding 'fbgemm_gpu/tbe/utils/common.py' 2025-05-07T20:03:15.5092146Z INFO:wheel:adding 'fbgemm_gpu/tbe/utils/offsets.py' 2025-05-07T20:03:15.5094828Z INFO:wheel:adding 'fbgemm_gpu/tbe/utils/quantize.py' 2025-05-07T20:03:15.5100678Z INFO:wheel:adding 'fbgemm_gpu/tbe/utils/requests.py' 2025-05-07T20:03:15.5103162Z INFO:wheel:adding 'fbgemm_gpu/triton/__init__.py' 2025-05-07T20:03:15.5105057Z INFO:wheel:adding 'fbgemm_gpu/triton/common.py' 2025-05-07T20:03:15.5112753Z INFO:wheel:adding 'fbgemm_gpu/triton/quantize.py' 2025-05-07T20:03:15.5117487Z INFO:wheel:adding 'fbgemm_gpu/triton/quantize_ref.py' 2025-05-07T20:03:15.5119586Z INFO:wheel:adding 'fbgemm_gpu/triton/jagged/__init__.py' 2025-05-07T20:03:15.5127615Z INFO:wheel:adding 'fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py' 2025-05-07T20:03:15.5130098Z INFO:wheel:adding 'fbgemm_gpu/utils/__init__.py' 2025-05-07T20:03:15.5132490Z INFO:wheel:adding 'fbgemm_gpu/utils/filestore.py' 2025-05-07T20:03:15.5134261Z INFO:wheel:adding 'fbgemm_gpu/utils/loader.py' 2025-05-07T20:03:15.5136574Z INFO:wheel:adding 'fbgemm_gpu/utils/torch_library.py' 2025-05-07T20:03:15.5139785Z INFO:wheel:adding 'fbgemm_gpu_genai_nightly-2025.5.7.dist-info/METADATA' 2025-05-07T20:03:15.5140867Z INFO:wheel:adding 'fbgemm_gpu_genai_nightly-2025.5.7.dist-info/WHEEL' 2025-05-07T20:03:15.5141885Z INFO:wheel:adding 'fbgemm_gpu_genai_nightly-2025.5.7.dist-info/top_level.txt' 2025-05-07T20:03:15.5207748Z INFO:wheel:adding 'fbgemm_gpu_genai_nightly-2025.5.7.dist-info/RECORD' 2025-05-07T20:03:15.5208456Z INFO:root:removing _skbuild/linux-x86_64-3.9/setuptools/bdist.linux-x86_64/wheel 2025-05-07T20:03:15.6257586Z ╒════════════════════════════╤════════════════════════════════════════════════╕ 2025-05-07T20:03:15.6259171Z │ │ Version │ 2025-05-07T20:03:15.6260062Z ╞════════════════════════════╪════════════════════════════════════════════════╡ 2025-05-07T20:03:15.6260593Z │ PyTorch │ 2.8.0.dev20250507+cu128 │ 2025-05-07T20:03:15.6261143Z ├────────────────────────────┼────────────────────────────────────────────────┤ 2025-05-07T20:03:15.6261873Z │ CUDA (Declared by PyTorch) │ 12.8 │ 2025-05-07T20:03:15.6262462Z ├────────────────────────────┼────────────────────────────────────────────────┤ 2025-05-07T20:03:15.6262991Z │ CUDA (Actual) │ nvcc: NVIDIA (R) Cuda compiler driver │ 2025-05-07T20:03:15.6263734Z │ │ Copyright (c) 2005-2025 NVIDIA Corporation │ 2025-05-07T20:03:15.6264186Z │ │ Built on Wed_Jan_15_19:20:09_PST_2025 │ 2025-05-07T20:03:15.6264639Z │ │ Cuda compilation tools, release 12.8, V12.8.61 │ 2025-05-07T20:03:15.6265161Z │ │ Build cuda_12.8.r12.8/compiler.35404655_0 │ 2025-05-07T20:03:15.6265650Z ╘════════════════════════════╧════════════════════════════════════════════════╛ 2025-05-07T20:03:25.3260196Z Successfully built fbgemm_gpu_genai_nightly-2025.5.7-cp39-cp39-manylinux_2_28_x86_64.whl 2025-05-07T20:03:29.5480272Z 2025-05-07T20:03:29.6791395Z ################################################################################ 2025-05-07T20:03:29.6793449Z [CHECK] BUILT LIBRARY: ./_skbuild/linux-x86_64-3.9/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:29.6795281Z [CHECK] Listing out library size: 2025-05-07T20:03:29.6821070Z + du -h --block-size=1M ./_skbuild/linux-x86_64-3.9/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:29.6821721Z 2025-05-07T20:03:29.6933002Z 91 ./_skbuild/linux-x86_64-3.9/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:29.6934344Z 2025-05-07T20:03:29.6966107Z [CHECK] Listing out the GLIBC versions referenced by: ./_skbuild/linux-x86_64-3.9/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:29.6970386Z + objdump -TC ./_skbuild/linux-x86_64-3.9/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so | grep GLIBC_ | sed 's/.*GLIBC_\([.0-9]*\).*/GLIBC_\1/g' | sort -Vu | cat 2025-05-07T20:03:29.6971540Z 2025-05-07T20:03:29.8068895Z GLIBC_2.2.5 2025-05-07T20:03:29.8069539Z GLIBC_2.3 2025-05-07T20:03:29.8073585Z GLIBC_2.14 2025-05-07T20:03:29.8073763Z 2025-05-07T20:03:29.8073938Z 2025-05-07T20:03:29.8074999Z [CHECK] Listing out the GLIBCXX versions referenced by: ./_skbuild/linux-x86_64-3.9/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:29.8076295Z + objdump -TC ./_skbuild/linux-x86_64-3.9/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so | grep GLIBCXX_ | sed 's/.*GLIBCXX_\([.0-9]*\).*/GLIBCXX_\1/g' | sort -Vu | cat 2025-05-07T20:03:29.8077069Z 2025-05-07T20:03:29.8296743Z GLIBCXX_3.4 2025-05-07T20:03:29.8297638Z GLIBCXX_3.4.9 2025-05-07T20:03:29.8298270Z GLIBCXX_3.4.11 2025-05-07T20:03:29.8298867Z GLIBCXX_3.4.18 2025-05-07T20:03:29.8299425Z GLIBCXX_3.4.21 2025-05-07T20:03:29.8300001Z GLIBCXX_3.4.29 2025-05-07T20:03:29.8300346Z 2025-05-07T20:03:29.8300361Z 2025-05-07T20:03:29.8828103Z + nm -gDC ./_skbuild/linux-x86_64-3.9/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so > /tmp/tmp.MVw7XlsOo5.symbols.txt 2025-05-07T20:03:29.8829991Z 2025-05-07T20:03:29.9050255Z 2025-05-07T20:03:29.9326940Z [CHECK] Total Number of symbols: 2736 2025-05-07T20:03:29.9352798Z [CHECK] Number of fbgemm symbols: 676 2025-05-07T20:03:29.9370018Z + nm -gDCu ./_skbuild/linux-x86_64-3.9/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so > /tmp/tmp.wDgsH0e8Ph.usymbols.txt 2025-05-07T20:03:29.9370662Z 2025-05-07T20:03:29.9399721Z 2025-05-07T20:03:29.9442715Z [CHECK] Listing out undefined symbols (249 total): 2025-05-07T20:03:29.9460357Z U VTT for std::__cxx11::basic_ostringstream, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:03:29.9461417Z U VTT for std::__cxx11::basic_stringstream, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:03:29.9462099Z U _Unwind_Resume@GCC_3.0 2025-05-07T20:03:29.9462687Z U __assert_fail@GLIBC_2.2.5 2025-05-07T20:03:29.9463083Z U __cudaPopCallConfiguration@libcudart.so.12 2025-05-07T20:03:29.9463556Z U __cudaPushCallConfiguration@libcudart.so.12 2025-05-07T20:03:29.9463961Z U __cudaRegisterFatBinary@libcudart.so.12 2025-05-07T20:03:29.9464488Z U __cudaRegisterFatBinaryEnd@libcudart.so.12 2025-05-07T20:03:29.9464918Z U __cudaRegisterFunction@libcudart.so.12 2025-05-07T20:03:29.9465294Z U __cudaRegisterVar@libcudart.so.12 2025-05-07T20:03:29.9465711Z U __cudaUnregisterFatBinary@libcudart.so.12 2025-05-07T20:03:29.9466162Z U __cxa_allocate_exception@CXXABI_1.3 2025-05-07T20:03:29.9466527Z U __cxa_atexit@GLIBC_2.2.5 2025-05-07T20:03:29.9466860Z U __cxa_begin_catch@CXXABI_1.3 2025-05-07T20:03:29.9467227Z U __cxa_end_catch@CXXABI_1.3 2025-05-07T20:03:29.9467948Z U __cxa_free_exception@CXXABI_1.3 2025-05-07T20:03:29.9468383Z U __cxa_guard_abort@CXXABI_1.3 2025-05-07T20:03:29.9468895Z U __cxa_guard_acquire@CXXABI_1.3 2025-05-07T20:03:29.9469253Z U __cxa_guard_release@CXXABI_1.3 2025-05-07T20:03:29.9469628Z U __cxa_rethrow@CXXABI_1.3 2025-05-07T20:03:29.9469971Z U __cxa_thread_atexit@CXXABI_1.3.7 2025-05-07T20:03:29.9470341Z U __cxa_throw@CXXABI_1.3 2025-05-07T20:03:29.9470677Z U __gxx_personality_v0@CXXABI_1.3 2025-05-07T20:03:29.9471037Z U __tls_get_addr@GLIBC_2.3 2025-05-07T20:03:29.9471358Z U __udivti3@GCC_3.0 2025-05-07T20:03:29.9471686Z U __xstat@GLIBC_2.2.5 2025-05-07T20:03:29.9472069Z U at::CUDAGeneratorImpl::device_type() 2025-05-07T20:03:29.9472500Z U at::CUDAGeneratorImpl::philox_cuda_state(unsigned long) 2025-05-07T20:03:29.9472957Z U at::TensorMaker::make_tensor() 2025-05-07T20:03:29.9473424Z U at::_ops::add__Tensor::call(at::Tensor&, at::Tensor const&, c10::Scalar const&) 2025-05-07T20:03:29.9474054Z U at::_ops::div__Scalar::call(at::Tensor&, c10::Scalar const&) 2025-05-07T20:03:29.9474995Z U at::_ops::empty_like::call(at::Tensor const&, std::optional, std::optional, std::optional, std::optional, std::optional) 2025-05-07T20:03:29.9476397Z U at::_ops::empty_memory_format::call(c10::ArrayRef, std::optional, std::optional, std::optional, std::optional, std::optional) 2025-05-07T20:03:29.9477601Z U at::_ops::expand::call(at::Tensor const&, c10::ArrayRef, bool) 2025-05-07T20:03:29.9478202Z U at::_ops::index_select::call(at::Tensor const&, long, at::Tensor const&) 2025-05-07T20:03:29.9478740Z U at::_ops::norm_Scalar::call(at::Tensor const&, c10::Scalar const&) 2025-05-07T20:03:29.9479343Z U at::_ops::scatter_add_::call(at::Tensor&, long, at::Tensor const&, at::Tensor const&) 2025-05-07T20:03:29.9479932Z U at::_ops::select_int::call(at::Tensor const&, long, c10::SymInt) 2025-05-07T20:03:29.9480492Z U at::_ops::split_sizes::call(at::Tensor const&, c10::ArrayRef, long) 2025-05-07T20:03:29.9481365Z U at::_ops::sum_dim_IntList::call(at::Tensor const&, c10::OptionalArrayRef, bool, std::optional) 2025-05-07T20:03:29.9482184Z U at::_ops::to_dtype::call(at::Tensor const&, c10::ScalarType, bool, bool, std::optional) 2025-05-07T20:03:29.9483268Z U at::_ops::to_dtype_layout::call(at::Tensor const&, std::optional, std::optional, std::optional, std::optional, bool, bool, std::optional) 2025-05-07T20:03:29.9484257Z U at::_ops::unsqueeze::call(at::Tensor const&, long) 2025-05-07T20:03:29.9484739Z U at::_ops::view::call(at::Tensor const&, c10::ArrayRef) 2025-05-07T20:03:29.9485574Z U at::_ops::zeros::call(c10::ArrayRef, std::optional, std::optional, std::optional, std::optional) 2025-05-07T20:03:29.9486378Z U at::cuda::detail::getDefaultCUDAGenerator(signed char) 2025-05-07T20:03:29.9486805Z U at::cuda::getCurrentDeviceProperties() 2025-05-07T20:03:29.9487258Z U at::tensor(c10::ArrayRef, c10::TensorOptions const&) 2025-05-07T20:03:29.9487722Z U bcmp@GLIBC_2.2.5 2025-05-07T20:03:29.9488097Z U c10::AutogradMetaInterface::~AutogradMetaInterface() 2025-05-07T20:03:29.9488604Z U c10::BFloat16* at::TensorBase::data_ptr() const 2025-05-07T20:03:29.9489116Z U c10::BFloat16* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:03:29.9489575Z U c10::BoolType::get() 2025-05-07T20:03:29.9490205Z U c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) 2025-05-07T20:03:29.9490817Z U c10::Error::what() const 2025-05-07T20:03:29.9491308Z U c10::Float8_e4m3fn* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:03:29.9491782Z U c10::FloatType::get() 2025-05-07T20:03:29.9492142Z U c10::GeneratorImpl::device() const 2025-05-07T20:03:29.9492498Z U c10::IValue::isTensorList() const 2025-05-07T20:03:29.9492913Z U c10::IValue::reportToTensorTypeError() const 2025-05-07T20:03:29.9493309Z U c10::IntType::get() 2025-05-07T20:03:29.9494002Z U c10::ListType::get(std::__cxx11::basic_string, std::allocator > const&, c10::Type::SingletonOrSharedTypePtr) 2025-05-07T20:03:29.9494992Z U c10::MessageLogger::MessageLogger(char const*, int, int) 2025-05-07T20:03:29.9495454Z U c10::MessageLogger::~MessageLogger() 2025-05-07T20:03:29.9495971Z U c10::OptionalType::get(c10::Type::SingletonOrSharedTypePtr) 2025-05-07T20:03:29.9496470Z U c10::ScalarTypeType::get() 2025-05-07T20:03:29.9496866Z U c10::StorageImpl::throw_data_ptr_access_error() const 2025-05-07T20:03:29.9497378Z U c10::StringType::get() 2025-05-07T20:03:29.9497757Z U c10::SymBool::guard_bool(char const*, long) const 2025-05-07T20:03:29.9498279Z U c10::SymFloat::guard_float(char const*, long) const 2025-05-07T20:03:29.9499002Z U c10::SymInt::SymInt(c10::intrusive_ptr >) 2025-05-07T20:03:29.9499695Z U c10::SymInt::guard_int(char const*, long) const 2025-05-07T20:03:29.9500113Z U c10::SymInt::promote_to_negative() 2025-05-07T20:03:29.9500475Z U c10::SymInt::toSymNode() const 2025-05-07T20:03:29.9500903Z U c10::SymbolicShapeMeta::init_is_contiguous() const 2025-05-07T20:03:29.9501673Z U c10::TensorImpl::set_autograd_meta(std::unique_ptr >) 2025-05-07T20:03:29.9502419Z U c10::TensorImpl::throw_data_ptr_access_error() const 2025-05-07T20:03:29.9502845Z U c10::TensorType::get() 2025-05-07T20:03:29.9503195Z U c10::UndefinedTensorImpl::_singleton 2025-05-07T20:03:29.9504214Z U c10::Warning::Warning(std::variant, c10::SourceLocation const&, std::__cxx11::basic_string, std::allocator >, bool) 2025-05-07T20:03:29.9505289Z U c10::cuda::CUDACachingAllocator::allocator 2025-05-07T20:03:29.9505684Z U c10::cuda::CUDAStream::stream() const 2025-05-07T20:03:29.9506099Z U c10::cuda::ExchangeDevice(signed char) 2025-05-07T20:03:29.9506502Z U c10::cuda::GetDevice(signed char*) 2025-05-07T20:03:29.9506916Z U c10::cuda::MaybeSetDevice(signed char) 2025-05-07T20:03:29.9507321Z U c10::cuda::SetDevice(signed char) 2025-05-07T20:03:29.9507823Z U c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) 2025-05-07T20:03:29.9508390Z U c10::cuda::current_device() 2025-05-07T20:03:29.9508731Z U c10::cuda::device_count() 2025-05-07T20:03:29.9509132Z U c10::cuda::getCurrentCUDAStream(signed char) 2025-05-07T20:03:29.9509671Z U c10::cuda::getDefaultCUDAStream(signed char) 2025-05-07T20:03:29.9510069Z U c10::cuda::getStreamFromPool(bool, signed char) 2025-05-07T20:03:29.9510471Z U c10::cuda::getStreamFromPool(int, signed char) 2025-05-07T20:03:29.9510874Z U c10::cuda::setCurrentCUDAStream(c10::cuda::CUDAStream) 2025-05-07T20:03:29.9511271Z U c10::cuda::warn_or_error_on_sync() 2025-05-07T20:03:29.9511925Z U c10::detail::ListImpl::ListImpl(std::vector >, c10::Type::SingletonOrSharedTypePtr) 2025-05-07T20:03:29.9513003Z U c10::detail::infer_schema::make_function_schema(c10::ArrayRef, c10::ArrayRef) 2025-05-07T20:03:29.9514162Z U c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) 2025-05-07T20:03:29.9515023Z U c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&) 2025-05-07T20:03:29.9516037Z U c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, char const*) 2025-05-07T20:03:29.9517083Z U c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string, std::allocator > const&) 2025-05-07T20:03:29.9517873Z U c10::get_default_dtype() 2025-05-07T20:03:29.9518367Z U c10::impl::ExcludeDispatchKeyGuard::ExcludeDispatchKeyGuard(c10::DispatchKeySet) 2025-05-07T20:03:29.9518967Z U c10::impl::ExcludeDispatchKeyGuard::~ExcludeDispatchKeyGuard() 2025-05-07T20:03:29.9519404Z U c10::impl::GPUTrace::gpuTraceState 2025-05-07T20:03:29.9519799Z U c10::impl::GPUTrace::haveState 2025-05-07T20:03:29.9520195Z U c10::impl::cow::is_cow_data_ptr(c10::DataPtr const&) 2025-05-07T20:03:29.9520668Z U c10::impl::cow::materialize_cow_storage(c10::StorageImpl&) 2025-05-07T20:03:29.9521068Z U c10::impl::device_guard_impl_registry 2025-05-07T20:03:29.9521473Z U c10::operator*(c10::SymInt const&, int) 2025-05-07T20:03:29.9521873Z U c10::operator-(c10::SymInt const&, int) 2025-05-07T20:03:29.9522242Z U c10::operator-(c10::SymInt const&, long) 2025-05-07T20:03:29.9522671Z U c10::operator<<(std::ostream&, c10::Device const&) 2025-05-07T20:03:29.9523089Z U c10::operator<<(std::ostream&, c10::DeviceType) 2025-05-07T20:03:29.9523511Z U c10::throwNullDataPtrError() 2025-05-07T20:03:29.9523885Z U c10::warn(c10::Warning const&) 2025-05-07T20:03:29.9524233Z U c10::warnDeprecatedDataPtr() 2025-05-07T20:03:29.9524974Z U c10d::getNcclErrorDetailStr(ncclResult_t, std::optional, std::allocator > >) 2025-05-07T20:03:29.9525790Z U c10d::ncclGetErrorWithVersion[abi:cxx11](ncclResult_t) 2025-05-07T20:03:29.9526294Z U caffe2::TypeMeta::error_unsupported_typemeta(caffe2::TypeMeta) 2025-05-07T20:03:29.9526757Z U caffe2::TypeMeta::typeMetaDatas() 2025-05-07T20:03:29.9527074Z U cublasLtCreate 2025-05-07T20:03:29.9527359Z U cublasLtMatmul 2025-05-07T20:03:29.9527655Z U cublasLtMatmulAlgoGetHeuristic 2025-05-07T20:03:29.9527995Z U cublasLtMatmulDescCreate 2025-05-07T20:03:29.9528318Z U cublasLtMatmulDescSetAttribute 2025-05-07T20:03:29.9528698Z U cublasLtMatmulPreferenceCreate 2025-05-07T20:03:29.9529050Z U cublasLtMatmulPreferenceSetAttribute 2025-05-07T20:03:29.9529409Z U cublasLtMatrixLayoutCreate 2025-05-07T20:03:29.9529766Z U cudaDeviceGetAttribute@libcudart.so.12 2025-05-07T20:03:29.9530123Z U cudaDeviceSynchronize@libcudart.so.12 2025-05-07T20:03:29.9530536Z U cudaEventCreateWithFlags@libcudart.so.12 2025-05-07T20:03:29.9530916Z U cudaEventDestroy@libcudart.so.12 2025-05-07T20:03:29.9531318Z U cudaEventElapsedTime@libcudart.so.12 2025-05-07T20:03:29.9531682Z U cudaEventQuery@libcudart.so.12 2025-05-07T20:03:29.9532070Z U cudaEventRecord@libcudart.so.12 2025-05-07T20:03:29.9532434Z U cudaEventSynchronize@libcudart.so.12 2025-05-07T20:03:29.9532810Z U cudaFree@libcudart.so.12 2025-05-07T20:03:29.9533196Z U cudaFuncSetAttribute@libcudart.so.12 2025-05-07T20:03:29.9533548Z U cudaGetDevice@libcudart.so.12 2025-05-07T20:03:29.9533946Z U cudaGetDeviceProperties_v2@libcudart.so.12 2025-05-07T20:03:29.9534348Z U cudaGetDriverEntryPoint@libcudart.so.12 2025-05-07T20:03:29.9534752Z U cudaGetErrorName@libcudart.so.12 2025-05-07T20:03:29.9535117Z U cudaGetErrorString@libcudart.so.12 2025-05-07T20:03:29.9535538Z U cudaGetLastError@libcudart.so.12 2025-05-07T20:03:29.9535946Z U cudaIpcGetMemHandle@libcudart.so.12 2025-05-07T20:03:29.9536325Z U cudaIpcOpenMemHandle@libcudart.so.12 2025-05-07T20:03:29.9536726Z U cudaLaunchCooperativeKernel@libcudart.so.12 2025-05-07T20:03:29.9537163Z U cudaLaunchKernel@libcudart.so.12 2025-05-07T20:03:29.9537694Z U cudaLaunchKernelExC@libcudart.so.12 2025-05-07T20:03:29.9538105Z U cudaMalloc@libcudart.so.12 2025-05-07T20:03:29.9538443Z U cudaMemcpy@libcudart.so.12 2025-05-07T20:03:29.9538772Z U cudaMemcpyAsync@libcudart.so.12 2025-05-07T20:03:29.9539130Z U cudaMemsetAsync@libcudart.so.12 2025-05-07T20:03:29.9539483Z U cudaStreamQuery@libcudart.so.12 2025-05-07T20:03:29.9539835Z U cudaStreamSynchronize@libcudart.so.12 2025-05-07T20:03:29.9540209Z U cudaStreamWaitEvent@libcudart.so.12 2025-05-07T20:03:29.9540535Z U exit@GLIBC_2.2.5 2025-05-07T20:03:29.9540832Z U fclose@GLIBC_2.2.5 2025-05-07T20:03:29.9541117Z U fflush@GLIBC_2.2.5 2025-05-07T20:03:29.9541466Z U float* at::TensorBase::data_ptr() const 2025-05-07T20:03:29.9541894Z U float* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:03:29.9542263Z U fopen@GLIBC_2.2.5 2025-05-07T20:03:29.9542566Z U fprintf@GLIBC_2.2.5 2025-05-07T20:03:29.9542851Z U fread@GLIBC_2.2.5 2025-05-07T20:03:29.9543144Z U fwrite@GLIBC_2.2.5 2025-05-07T20:03:29.9543462Z U int* at::TensorBase::data_ptr() const 2025-05-07T20:03:29.9543891Z U int* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:03:29.9544345Z U long c10::detail::maybe_wrap_dim_slow(long, long, bool) 2025-05-07T20:03:29.9544778Z U long* at::TensorBase::data_ptr() const 2025-05-07T20:03:29.9545133Z U memcpy@GLIBC_2.14 2025-05-07T20:03:29.9545447Z U memmove@GLIBC_2.2.5 2025-05-07T20:03:29.9545751Z U memset@GLIBC_2.2.5 2025-05-07T20:03:29.9546030Z U ncclAllGather 2025-05-07T20:03:29.9546311Z U ncclAllReduce 2025-05-07T20:03:29.9546577Z U ncclCommInitRank 2025-05-07T20:03:29.9546870Z U ncclGetUniqueId 2025-05-07T20:03:29.9547174Z U ncclReduceScatter 2025-05-07T20:03:29.9547492Z U operator delete(void*)@GLIBCXX_3.4 2025-05-07T20:03:29.9547853Z U operator new(unsigned long)@GLIBCXX_3.4 2025-05-07T20:03:29.9548183Z U printf@GLIBC_2.2.5 2025-05-07T20:03:29.9548564Z U signed char* at::TensorBase::data_ptr() const 2025-05-07T20:03:29.9549051Z U signed char* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:03:29.9549857Z U std::__cxx11::basic_ostringstream, std::allocator >::basic_ostringstream() 2025-05-07T20:03:29.9550681Z U std::__cxx11::basic_ostringstream, std::allocator >::str() const &@GLIBCXX_3.4.29 2025-05-07T20:03:29.9551533Z U std::__cxx11::basic_ostringstream, std::allocator >::~basic_ostringstream()@GLIBCXX_3.4.21 2025-05-07T20:03:29.9552374Z U std::__cxx11::basic_stringstream, std::allocator >::basic_stringstream() 2025-05-07T20:03:29.9553195Z U std::__detail::_Prime_rehash_policy::_M_need_rehash(unsigned long, unsigned long, unsigned long) const@GLIBCXX_3.4.18 2025-05-07T20:03:29.9553817Z U std::__throw_bad_alloc()@GLIBCXX_3.4 2025-05-07T20:03:29.9554166Z U std::__throw_bad_array_new_length() 2025-05-07T20:03:29.9555726Z U std::__throw_bad_cast()@GLIBCXX_3.4 2025-05-07T20:03:29.9556116Z U std::__throw_length_error(char const*)@GLIBCXX_3.4 2025-05-07T20:03:29.9556512Z U std::__throw_logic_error(char const*)@GLIBCXX_3.4 2025-05-07T20:03:29.9556910Z U std::__throw_system_error(int)@GLIBCXX_3.4.11 2025-05-07T20:03:29.9557410Z U std::basic_ios >::clear(std::_Ios_Iostate)@GLIBCXX_3.4 2025-05-07T20:03:29.9558353Z U std::basic_ostream >& std::__ostream_insert >(std::basic_ostream >&, char const*, long)@GLIBCXX_3.4.9 2025-05-07T20:03:29.9559514Z U std::basic_ostream >& std::endl >(std::basic_ostream >&)@GLIBCXX_3.4 2025-05-07T20:03:29.9560618Z U std::basic_ostream >& std::operator<< >(std::basic_ostream >&, char const*)@GLIBCXX_3.4 2025-05-07T20:03:29.9561774Z U std::basic_ostream >& std::operator<< >(std::basic_ostream >&, unsigned char const*)@GLIBCXX_3.4 2025-05-07T20:03:29.9562539Z U std::cerr@GLIBCXX_3.4 2025-05-07T20:03:29.9562854Z U std::cout@GLIBCXX_3.4 2025-05-07T20:03:29.9563209Z U std::ctype::_M_widen_init() const@GLIBCXX_3.4.11 2025-05-07T20:03:29.9563621Z U std::exception::what() const@GLIBCXX_3.4 2025-05-07T20:03:29.9563981Z U std::exception::~exception()@GLIBCXX_3.4 2025-05-07T20:03:29.9564349Z U std::ios_base::Init::Init()@GLIBCXX_3.4 2025-05-07T20:03:29.9564726Z U std::ios_base::Init::~Init()@GLIBCXX_3.4 2025-05-07T20:03:29.9565086Z U std::ios_base::~ios_base()@GLIBCXX_3.4 2025-05-07T20:03:29.9565443Z U std::locale::~locale()@GLIBCXX_3.4 2025-05-07T20:03:29.9565859Z U std::logic_error::logic_error(char const*)@GLIBCXX_3.4.21 2025-05-07T20:03:29.9566287Z U std::logic_error::~logic_error()@GLIBCXX_3.4 2025-05-07T20:03:29.9566705Z U std::ostream& std::ostream::_M_insert(long)@GLIBCXX_3.4.9 2025-05-07T20:03:29.9567249Z U std::ostream& std::ostream::_M_insert(unsigned long)@GLIBCXX_3.4.9 2025-05-07T20:03:29.9568196Z U std::ostream& std::ostream::_M_insert(void const*)@GLIBCXX_3.4.9 2025-05-07T20:03:29.9568654Z U std::ostream::flush()@GLIBCXX_3.4 2025-05-07T20:03:29.9569015Z U std::ostream::operator<<(int)@GLIBCXX_3.4 2025-05-07T20:03:29.9569372Z U std::ostream::put(char)@GLIBCXX_3.4 2025-05-07T20:03:29.9569827Z U std::runtime_error::runtime_error(char const*)@GLIBCXX_3.4.21 2025-05-07T20:03:29.9570584Z U std::runtime_error::runtime_error(std::__cxx11::basic_string, std::allocator > const&)@GLIBCXX_3.4.21 2025-05-07T20:03:29.9571274Z U std::runtime_error::~runtime_error()@GLIBCXX_3.4 2025-05-07T20:03:29.9571649Z U std::terminate()@GLIBCXX_3.4 2025-05-07T20:03:29.9571958Z U stderr@GLIBC_2.2.5 2025-05-07T20:03:29.9572260Z U strlen@GLIBC_2.2.5 2025-05-07T20:03:29.9572593Z U torch::CppFunction::~CppFunction() 2025-05-07T20:03:29.9573429Z U torch::Library::Library(torch::Library::Kind, std::__cxx11::basic_string, std::allocator >, std::optional, char const*, unsigned int) 2025-05-07T20:03:29.9574628Z U torch::Library::_def(c10::FunctionSchema&&, c10::OperatorName*, std::vector > const&, torch::_RegisterOrVerify) & 2025-05-07T20:03:29.9575544Z U torch::Library::_impl(char const*, torch::CppFunction&&, torch::_RegisterOrVerify) & 2025-05-07T20:03:29.9576450Z U torch::cuda::nccl::all2all(std::vector >&, std::vector >&, void*, c10::cuda::CUDAStream&) 2025-05-07T20:03:29.9577505Z U torch::cuda::nccl::all2all_single_equal_split(at::Tensor&, at::Tensor&, int, void*, c10::cuda::CUDAStream&) 2025-05-07T20:03:29.9578373Z U torch::jit::parseSchema(std::__cxx11::basic_string, std::allocator > const&, bool) 2025-05-07T20:03:29.9578990Z U typeinfo for c10::Error 2025-05-07T20:03:29.9579398Z U typeinfo for std::exception@GLIBCXX_3.4 2025-05-07T20:03:29.9579796Z U typeinfo for std::logic_error@GLIBCXX_3.4 2025-05-07T20:03:29.9580223Z U typeinfo for std::runtime_error@GLIBCXX_3.4 2025-05-07T20:03:29.9580749Z U unsigned char* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:03:29.9581212Z U usleep@GLIBC_2.2.5 2025-05-07T20:03:29.9581611Z U vtable for __cxxabiv1::__class_type_info@CXXABI_1.3 2025-05-07T20:03:29.9582038Z U vtable for __cxxabiv1::__function_type_info@CXXABI_1.3 2025-05-07T20:03:29.9582488Z U vtable for __cxxabiv1::__si_class_type_info@CXXABI_1.3 2025-05-07T20:03:29.9582859Z U vtable for c10::Error 2025-05-07T20:03:29.9583416Z U vtable for std::__cxx11::basic_stringbuf, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:03:29.9584112Z U vtable for std::basic_streambuf >@GLIBCXX_3.4 2025-05-07T20:03:29.9584635Z U vtable for torch::autograd::AutogradMeta 2025-05-07T20:03:29.9584998Z w _ITM_deregisterTMCloneTable 2025-05-07T20:03:29.9585321Z w _ITM_registerTMCloneTable 2025-05-07T20:03:29.9585670Z w __cxa_finalize@GLIBC_2.2.5 2025-05-07T20:03:29.9585966Z w __gmon_start__ 2025-05-07T20:03:29.9586299Z w __pthread_key_create 2025-05-07T20:03:29.9586624Z w pthread_mutex_lock@GLIBC_2.2.5 2025-05-07T20:03:29.9586955Z w pthread_mutex_unlock@GLIBC_2.2.5 2025-05-07T20:03:29.9587336Z [CHECK] Listing out external shared libraries linked: 2025-05-07T20:03:29.9588242Z + ldd ./_skbuild/linux-x86_64-3.9/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:29.9588695Z 2025-05-07T20:03:29.9588820Z linux-vdso.so.1 (0x00007ffd24ae2000) 2025-05-07T20:03:29.9589131Z libtorch.so => not found 2025-05-07T20:03:29.9589381Z libc10.so => not found 2025-05-07T20:03:29.9589643Z libc10_cuda.so => not found 2025-05-07T20:03:29.9589906Z libnccl.so.2 => not found 2025-05-07T20:03:29.9590194Z libtorch_cpu.so => not found 2025-05-07T20:03:29.9590483Z libtorch_cuda.so => not found 2025-05-07T20:03:29.9590796Z libcudart.so.12 => not found 2025-05-07T20:03:29.9591128Z libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f6e8df9c000) 2025-05-07T20:03:29.9591566Z libgomp.so.1 => /lib64/libgomp.so.1 (0x00007f6e94046000) 2025-05-07T20:03:29.9591970Z libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f6e94018000) 2025-05-07T20:03:29.9592369Z libc.so.6 => /lib64/libc.so.6 (0x00007f6e8dd94000) 2025-05-07T20:03:29.9592745Z /lib64/ld-linux-x86-64.so.2 (0x00007f6e940a2000) 2025-05-07T20:03:29.9593207Z libm.so.6 => /lib64/libm.so.6 (0x00007f6e93f3b000) 2025-05-07T20:03:29.9593434Z 2025-05-07T20:03:29.9593553Z [CHECK] Displaying ELF information: 2025-05-07T20:03:29.9594091Z + readelf -d ./_skbuild/linux-x86_64-3.9/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:29.9594557Z 2025-05-07T20:03:29.9774252Z 2025-05-07T20:03:29.9774723Z Dynamic section at offset 0x5ae3168 contains 38 entries: 2025-05-07T20:03:29.9775143Z Tag Type Name/Value 2025-05-07T20:03:29.9775756Z 0x0000000000000001 (NEEDED) Shared library: [libtorch.so] 2025-05-07T20:03:29.9776286Z 0x0000000000000001 (NEEDED) Shared library: [libc10.so] 2025-05-07T20:03:29.9776796Z 0x0000000000000001 (NEEDED) Shared library: [libc10_cuda.so] 2025-05-07T20:03:29.9777560Z 0x0000000000000001 (NEEDED) Shared library: [libnccl.so.2] 2025-05-07T20:03:29.9778099Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cpu.so] 2025-05-07T20:03:29.9778631Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cuda.so] 2025-05-07T20:03:29.9779169Z 0x0000000000000001 (NEEDED) Shared library: [libcudart.so.12] 2025-05-07T20:03:29.9779696Z 0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6] 2025-05-07T20:03:29.9780237Z 0x0000000000000001 (NEEDED) Shared library: [libgomp.so.1] 2025-05-07T20:03:29.9780760Z 0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1] 2025-05-07T20:03:29.9781352Z 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 2025-05-07T20:03:29.9781849Z 0x0000000000000001 (NEEDED) Shared library: [ld-linux-x86-64.so.2] 2025-05-07T20:03:29.9782394Z 0x000000000000000e (SONAME) Library soname: [fbgemm_gpu_experimental_gen_ai.so] 2025-05-07T20:03:29.9782860Z 0x000000000000000c (INIT) 0x15d000 2025-05-07T20:03:29.9783175Z 0x000000000000000d (FINI) 0x5089fc 2025-05-07T20:03:29.9783509Z 0x0000000000000019 (INIT_ARRAY) 0x5ae0d28 2025-05-07T20:03:29.9783860Z 0x000000000000001b (INIT_ARRAYSZ) 1136 (bytes) 2025-05-07T20:03:29.9784198Z 0x000000000000001a (FINI_ARRAY) 0x5ae1198 2025-05-07T20:03:29.9784537Z 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) 2025-05-07T20:03:29.9784923Z 0x000000006ffffef5 (GNU_HASH) 0x238 2025-05-07T20:03:29.9785250Z 0x0000000000000005 (STRTAB) 0x141b8 2025-05-07T20:03:29.9785562Z 0x0000000000000006 (SYMTAB) 0x4120 2025-05-07T20:03:29.9785916Z 0x000000000000000a (STRSZ) 1239381 (bytes) 2025-05-07T20:03:29.9786310Z 0x000000000000000b (SYMENT) 24 (bytes) 2025-05-07T20:03:29.9786659Z 0x0000000000000003 (PLTGOT) 0x5ae4418 2025-05-07T20:03:29.9787019Z 0x0000000000000002 (PLTRELSZ) 44880 (bytes) 2025-05-07T20:03:29.9787356Z 0x0000000000000014 (PLTREL) RELA 2025-05-07T20:03:29.9787682Z 0x0000000000000017 (JMPREL) 0x151300 2025-05-07T20:03:29.9788034Z 0x0000000000000007 (RELA) 0x144190 2025-05-07T20:03:29.9788383Z 0x0000000000000008 (RELASZ) 53616 (bytes) 2025-05-07T20:03:29.9788718Z 0x0000000000000009 (RELAENT) 24 (bytes) 2025-05-07T20:03:29.9789044Z 0x0000000000000018 (BIND_NOW) 2025-05-07T20:03:29.9789352Z 0x000000006ffffffb (FLAGS_1) Flags: NOW 2025-05-07T20:03:29.9789701Z 0x000000006ffffffe (VERNEED) 0x144070 2025-05-07T20:03:29.9790029Z 0x000000006fffffff (VERNEEDNUM) 5 2025-05-07T20:03:29.9790502Z 0x000000006ffffff0 (VERSYM) 0x142b0e 2025-05-07T20:03:29.9790852Z 0x000000006ffffff9 (RELACOUNT) 420 2025-05-07T20:03:29.9791167Z 0x0000000000000000 (NULL) 0x0 2025-05-07T20:03:29.9791386Z 2025-05-07T20:03:29.9791503Z ################################################################################ 2025-05-07T20:03:29.9791731Z 2025-05-07T20:03:29.9791735Z 2025-05-07T20:03:29.9791887Z ################################################################################ 2025-05-07T20:03:29.9792537Z [CHECK] BUILT LIBRARY: ./_skbuild/linux-x86_64-3.9/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:29.9793183Z [CHECK] Listing out library size: 2025-05-07T20:03:29.9793772Z + du -h --block-size=1M ./_skbuild/linux-x86_64-3.9/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:29.9794292Z 2025-05-07T20:03:29.9794664Z 1 ./_skbuild/linux-x86_64-3.9/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:29.9795102Z 2025-05-07T20:03:29.9795644Z [CHECK] Listing out the GLIBC versions referenced by: ./_skbuild/linux-x86_64-3.9/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:29.9796909Z + objdump -TC ./_skbuild/linux-x86_64-3.9/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so | grep GLIBC_ | sed 's/.*GLIBC_\([.0-9]*\).*/GLIBC_\1/g' | sort -Vu | cat 2025-05-07T20:03:29.9797670Z 2025-05-07T20:03:29.9858671Z GLIBC_2.2.5 2025-05-07T20:03:29.9859975Z GLIBC_2.14 2025-05-07T20:03:29.9860848Z 2025-05-07T20:03:29.9860865Z 2025-05-07T20:03:29.9862129Z [CHECK] Listing out the GLIBCXX versions referenced by: ./_skbuild/linux-x86_64-3.9/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:29.9863519Z + objdump -TC ./_skbuild/linux-x86_64-3.9/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so | grep GLIBCXX_ | sed 's/.*GLIBCXX_\([.0-9]*\).*/GLIBCXX_\1/g' | sort -Vu | cat 2025-05-07T20:03:29.9864322Z 2025-05-07T20:03:29.9928648Z GLIBCXX_3.4 2025-05-07T20:03:29.9929323Z GLIBCXX_3.4.9 2025-05-07T20:03:29.9929676Z GLIBCXX_3.4.21 2025-05-07T20:03:29.9929808Z 2025-05-07T20:03:29.9929813Z 2025-05-07T20:03:29.9956456Z + nm -gDC ./_skbuild/linux-x86_64-3.9/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so > /tmp/tmp.1qBIT2cPvD.symbols.txt 2025-05-07T20:03:29.9957100Z 2025-05-07T20:03:29.9976657Z 2025-05-07T20:03:30.0008909Z [CHECK] Total Number of symbols: 154 2025-05-07T20:03:30.0024764Z [CHECK] Number of fbgemm symbols: 15 2025-05-07T20:03:30.0048279Z + nm -gDCu ./_skbuild/linux-x86_64-3.9/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so > /tmp/tmp.fXq27UnqoG.usymbols.txt 2025-05-07T20:03:30.0049919Z 2025-05-07T20:03:30.0068872Z 2025-05-07T20:03:30.0099805Z [CHECK] Listing out undefined symbols (76 total): 2025-05-07T20:03:30.0118232Z U VTT for std::__cxx11::basic_ostringstream, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:03:30.0120041Z U _Unwind_Resume@GCC_3.0 2025-05-07T20:03:30.0121441Z U __cudaPopCallConfiguration@libcudart.so.12 2025-05-07T20:03:30.0122616Z U __cudaPushCallConfiguration@libcudart.so.12 2025-05-07T20:03:30.0123261Z U __cudaRegisterFatBinary@libcudart.so.12 2025-05-07T20:03:30.0123634Z U __cudaRegisterFatBinaryEnd@libcudart.so.12 2025-05-07T20:03:30.0124124Z U __cudaRegisterFunction@libcudart.so.12 2025-05-07T20:03:30.0124492Z U __cudaRegisterVar@libcudart.so.12 2025-05-07T20:03:30.0124887Z U __cudaUnregisterFatBinary@libcudart.so.12 2025-05-07T20:03:30.0125257Z U __cxa_allocate_exception@CXXABI_1.3 2025-05-07T20:03:30.0125616Z U __cxa_atexit@GLIBC_2.2.5 2025-05-07T20:03:30.0125957Z U __cxa_begin_catch@CXXABI_1.3 2025-05-07T20:03:30.0126281Z U __cxa_free_exception@CXXABI_1.3 2025-05-07T20:03:30.0126619Z U __cxa_throw@CXXABI_1.3 2025-05-07T20:03:30.0126929Z U __gxx_personality_v0@CXXABI_1.3 2025-05-07T20:03:30.0127396Z U at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&) 2025-05-07T20:03:30.0128062Z U at::_ops::to_dtype::call(at::Tensor const&, c10::ScalarType, bool, bool, std::optional) 2025-05-07T20:03:30.0128933Z U at::_ops::zeros::call(c10::ArrayRef, std::optional, std::optional, std::optional, std::optional) 2025-05-07T20:03:30.0129626Z U c10::FloatType::get() 2025-05-07T20:03:30.0129972Z U c10::IValue::reportToTensorTypeError() const 2025-05-07T20:03:30.0130413Z U c10::MessageLogger::MessageLogger(char const*, int, int) 2025-05-07T20:03:30.0130921Z U c10::MessageLogger::~MessageLogger() 2025-05-07T20:03:30.0131300Z U c10::SymFloat::guard_float(char const*, long) const 2025-05-07T20:03:30.0131679Z U c10::TensorType::get() 2025-05-07T20:03:30.0132012Z U c10::UndefinedTensorImpl::_singleton 2025-05-07T20:03:30.0132760Z U c10::detail::infer_schema::make_function_schema(c10::ArrayRef, c10::ArrayRef) 2025-05-07T20:03:30.0133644Z U c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) 2025-05-07T20:03:30.0134475Z U c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&) 2025-05-07T20:03:30.0135417Z U c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, char const*) 2025-05-07T20:03:30.0136446Z U c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string, std::allocator > const&) 2025-05-07T20:03:30.0137611Z U caffe2::TypeMeta::error_unsupported_typemeta(caffe2::TypeMeta) 2025-05-07T20:03:30.0138096Z U cudaGetErrorString@libcudart.so.12 2025-05-07T20:03:30.0138463Z U cudaGetLastError@libcudart.so.12 2025-05-07T20:03:30.0138855Z U cudaLaunchKernel@libcudart.so.12 2025-05-07T20:03:30.0139244Z U float* at::TensorBase::data_ptr() const 2025-05-07T20:03:30.0139726Z U long c10::detail::maybe_wrap_dim_slow(long, long, bool) 2025-05-07T20:03:30.0140156Z U memcpy@GLIBC_2.14 2025-05-07T20:03:30.0140538Z U memmove@GLIBC_2.2.5 2025-05-07T20:03:30.0140867Z U memset@GLIBC_2.2.5 2025-05-07T20:03:30.0141166Z U ncclCommDestroy 2025-05-07T20:03:30.0141484Z U ncclCommInitAll 2025-05-07T20:03:30.0141802Z U operator delete(void*)@GLIBCXX_3.4 2025-05-07T20:03:30.0142227Z U operator new(unsigned long)@GLIBCXX_3.4 2025-05-07T20:03:30.0142862Z U std::__cxx11::basic_ostringstream, std::allocator >::basic_ostringstream() 2025-05-07T20:03:30.0143747Z U std::__cxx11::basic_ostringstream, std::allocator >::~basic_ostringstream()@GLIBCXX_3.4.21 2025-05-07T20:03:30.0144465Z U std::__throw_bad_alloc()@GLIBCXX_3.4 2025-05-07T20:03:30.0144857Z U std::__throw_length_error(char const*)@GLIBCXX_3.4 2025-05-07T20:03:30.0145302Z U std::__throw_logic_error(char const*)@GLIBCXX_3.4 2025-05-07T20:03:30.0145850Z U std::basic_ios >::clear(std::_Ios_Iostate)@GLIBCXX_3.4 2025-05-07T20:03:30.0146834Z U std::basic_ostream >& std::__ostream_insert >(std::basic_ostream >&, char const*, long)@GLIBCXX_3.4.9 2025-05-07T20:03:30.0147706Z U std::ios_base::Init::Init()@GLIBCXX_3.4 2025-05-07T20:03:30.0148103Z U std::ios_base::Init::~Init()@GLIBCXX_3.4 2025-05-07T20:03:30.0148475Z U std::ios_base::~ios_base()@GLIBCXX_3.4 2025-05-07T20:03:30.0148856Z U std::locale::~locale()@GLIBCXX_3.4 2025-05-07T20:03:30.0149286Z U std::ostream& std::ostream::_M_insert(long)@GLIBCXX_3.4.9 2025-05-07T20:03:30.0149876Z U std::ostream::operator<<(int)@GLIBCXX_3.4 2025-05-07T20:03:30.0150554Z U std::runtime_error::runtime_error(std::__cxx11::basic_string, std::allocator > const&)@GLIBCXX_3.4.21 2025-05-07T20:03:30.0151202Z U std::runtime_error::~runtime_error()@GLIBCXX_3.4 2025-05-07T20:03:30.0152386Z U std::terminate()@GLIBCXX_3.4 2025-05-07T20:03:30.0152695Z U strlen@GLIBC_2.2.5 2025-05-07T20:03:30.0153016Z U torch::CppFunction::~CppFunction() 2025-05-07T20:03:30.0153795Z U torch::Library::Library(torch::Library::Kind, std::__cxx11::basic_string, std::allocator >, std::optional, char const*, unsigned int) 2025-05-07T20:03:30.0154923Z U torch::Library::_def(c10::FunctionSchema&&, c10::OperatorName*, std::vector > const&, torch::_RegisterOrVerify) & 2025-05-07T20:03:30.0155726Z U torch::Library::_impl(char const*, torch::CppFunction&&, torch::_RegisterOrVerify) & 2025-05-07T20:03:30.0156432Z U torch::jit::parseSchema(std::__cxx11::basic_string, std::allocator > const&, bool) 2025-05-07T20:03:30.0157028Z U typeinfo for std::runtime_error@GLIBCXX_3.4 2025-05-07T20:03:30.0157592Z U vtable for __cxxabiv1::__class_type_info@CXXABI_1.3 2025-05-07T20:03:30.0158016Z U vtable for __cxxabiv1::__function_type_info@CXXABI_1.3 2025-05-07T20:03:30.0158464Z U vtable for __cxxabiv1::__si_class_type_info@CXXABI_1.3 2025-05-07T20:03:30.0159076Z U vtable for std::__cxx11::basic_stringbuf, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:03:30.0159956Z U vtable for std::basic_streambuf >@GLIBCXX_3.4 2025-05-07T20:03:30.0160436Z w _ITM_deregisterTMCloneTable 2025-05-07T20:03:30.0160766Z w _ITM_registerTMCloneTable 2025-05-07T20:03:30.0161108Z w __cxa_finalize@GLIBC_2.2.5 2025-05-07T20:03:30.0161459Z w __gmon_start__ 2025-05-07T20:03:30.0161761Z w __pthread_key_create 2025-05-07T20:03:30.0162112Z [CHECK] Listing out external shared libraries linked: 2025-05-07T20:03:30.0162753Z + ldd ./_skbuild/linux-x86_64-3.9/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:30.0163255Z 2025-05-07T20:03:30.0163409Z linux-vdso.so.1 (0x00007fffb4262000) 2025-05-07T20:03:30.0163715Z libtorch.so => not found 2025-05-07T20:03:30.0163986Z libc10.so => not found 2025-05-07T20:03:30.0164228Z libnccl.so.2 => not found 2025-05-07T20:03:30.0164500Z libtorch_cpu.so => not found 2025-05-07T20:03:30.0164805Z libtorch_cuda.so => not found 2025-05-07T20:03:30.0165087Z libcudart.so.12 => not found 2025-05-07T20:03:30.0165434Z libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f0a3e756000) 2025-05-07T20:03:30.0165853Z libgomp.so.1 => /lib64/libgomp.so.1 (0x00007f0a3e700000) 2025-05-07T20:03:30.0166280Z libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f0a3e6d2000) 2025-05-07T20:03:30.0166668Z libc.so.6 => /lib64/libc.so.6 (0x00007f0a3e4ca000) 2025-05-07T20:03:30.0167038Z libm.so.6 => /lib64/libm.so.6 (0x00007f0a3e3ef000) 2025-05-07T20:03:30.0167627Z /lib64/ld-linux-x86-64.so.2 (0x00007f0a3ea34000) 2025-05-07T20:03:30.0167882Z 2025-05-07T20:03:30.0167996Z [CHECK] Displaying ELF information: 2025-05-07T20:03:30.0168604Z + readelf -d ./_skbuild/linux-x86_64-3.9/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:30.0169107Z 2025-05-07T20:03:30.0199666Z 2025-05-07T20:03:30.0200107Z Dynamic section at offset 0x71978 contains 36 entries: 2025-05-07T20:03:30.0200799Z Tag Type Name/Value 2025-05-07T20:03:30.0201299Z 0x0000000000000001 (NEEDED) Shared library: [libtorch.so] 2025-05-07T20:03:30.0201814Z 0x0000000000000001 (NEEDED) Shared library: [libc10.so] 2025-05-07T20:03:30.0202340Z 0x0000000000000001 (NEEDED) Shared library: [libnccl.so.2] 2025-05-07T20:03:30.0202864Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cpu.so] 2025-05-07T20:03:30.0203407Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cuda.so] 2025-05-07T20:03:30.0204221Z 0x0000000000000001 (NEEDED) Shared library: [libcudart.so.12] 2025-05-07T20:03:30.0204767Z 0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6] 2025-05-07T20:03:30.0205399Z 0x0000000000000001 (NEEDED) Shared library: [libgomp.so.1] 2025-05-07T20:03:30.0205880Z 0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1] 2025-05-07T20:03:30.0206358Z 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 2025-05-07T20:03:30.0206897Z 0x000000000000000e (SONAME) Library soname: [fbgemm_gpu_experimental_example_py.so] 2025-05-07T20:03:30.0207380Z 0x000000000000000c (INIT) 0x5000 2025-05-07T20:03:30.0207689Z 0x000000000000000d (FINI) 0x98dc 2025-05-07T20:03:30.0208017Z 0x0000000000000019 (INIT_ARRAY) 0x727d0 2025-05-07T20:03:30.0208359Z 0x000000000000001b (INIT_ARRAYSZ) 32 (bytes) 2025-05-07T20:03:30.0208683Z 0x000000000000001a (FINI_ARRAY) 0x727f0 2025-05-07T20:03:30.0209020Z 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) 2025-05-07T20:03:30.0209344Z 0x000000006ffffef5 (GNU_HASH) 0x200 2025-05-07T20:03:30.0209662Z 0x0000000000000005 (STRTAB) 0x1448 2025-05-07T20:03:30.0209960Z 0x0000000000000006 (SYMTAB) 0x5c0 2025-05-07T20:03:30.0210291Z 0x000000000000000a (STRSZ) 9972 (bytes) 2025-05-07T20:03:30.0210635Z 0x000000000000000b (SYMENT) 24 (bytes) 2025-05-07T20:03:30.0210958Z 0x0000000000000003 (PLTGOT) 0x72c08 2025-05-07T20:03:30.0211303Z 0x0000000000000002 (PLTRELSZ) 2208 (bytes) 2025-05-07T20:03:30.0211627Z 0x0000000000000014 (PLTREL) RELA 2025-05-07T20:03:30.0211942Z 0x0000000000000017 (JMPREL) 0x4530 2025-05-07T20:03:30.0212328Z 0x0000000000000007 (RELA) 0x3d38 2025-05-07T20:03:30.0212666Z 0x0000000000000008 (RELASZ) 2040 (bytes) 2025-05-07T20:03:30.0212998Z 0x0000000000000009 (RELAENT) 24 (bytes) 2025-05-07T20:03:30.0213327Z 0x0000000000000018 (BIND_NOW) 2025-05-07T20:03:30.0213702Z 0x000000006ffffffb (FLAGS_1) Flags: NOW 2025-05-07T20:03:30.0214029Z 0x000000006ffffffe (VERNEED) 0x3c78 2025-05-07T20:03:30.0214355Z 0x000000006fffffff (VERNEEDNUM) 4 2025-05-07T20:03:30.0214662Z 0x000000006ffffff0 (VERSYM) 0x3b3c 2025-05-07T20:03:30.0214982Z 0x000000006ffffff9 (RELACOUNT) 7 2025-05-07T20:03:30.0215337Z 0x0000000000000000 (NULL) 0x0 2025-05-07T20:03:30.0215546Z 2025-05-07T20:03:30.0215676Z ################################################################################ 2025-05-07T20:03:30.0215892Z 2025-05-07T20:03:30.0215896Z 2025-05-07T20:03:30.0216019Z ################################################################################ 2025-05-07T20:03:30.0216612Z [CHECK] BUILT LIBRARY: ./_skbuild/linux-x86_64-3.9/cmake-build/asmjit.so 2025-05-07T20:03:30.0217176Z [CHECK] Listing out library size: 2025-05-07T20:03:30.0217744Z + du -h --block-size=1M ./_skbuild/linux-x86_64-3.9/cmake-build/asmjit.so 2025-05-07T20:03:30.0218075Z 2025-05-07T20:03:30.0218285Z 1 ./_skbuild/linux-x86_64-3.9/cmake-build/asmjit.so 2025-05-07T20:03:30.0218526Z 2025-05-07T20:03:30.0218868Z [CHECK] Listing out the GLIBC versions referenced by: ./_skbuild/linux-x86_64-3.9/cmake-build/asmjit.so 2025-05-07T20:03:30.0219742Z + objdump -TC ./_skbuild/linux-x86_64-3.9/cmake-build/asmjit.so | grep GLIBC_ | sed 's/.*GLIBC_\([.0-9]*\).*/GLIBC_\1/g' | sort -Vu | cat 2025-05-07T20:03:30.0220300Z 2025-05-07T20:03:30.0277960Z GLIBC_2.2.5 2025-05-07T20:03:30.0278561Z GLIBC_2.14 2025-05-07T20:03:30.0279566Z 2025-05-07T20:03:30.0279910Z 2025-05-07T20:03:30.0280586Z [CHECK] Listing out the GLIBCXX versions referenced by: ./_skbuild/linux-x86_64-3.9/cmake-build/asmjit.so 2025-05-07T20:03:30.0281578Z + objdump -TC ./_skbuild/linux-x86_64-3.9/cmake-build/asmjit.so | grep GLIBCXX_ | sed 's/.*GLIBCXX_\([.0-9]*\).*/GLIBCXX_\1/g' | sort -Vu | cat 2025-05-07T20:03:30.0282145Z 2025-05-07T20:03:30.0340273Z GLIBCXX_3.4 2025-05-07T20:03:30.0341849Z 2025-05-07T20:03:30.0341854Z 2025-05-07T20:03:30.0364723Z + nm -gDC ./_skbuild/linux-x86_64-3.9/cmake-build/asmjit.so > /tmp/tmp.jcHpc99DYG.symbols.txt 2025-05-07T20:03:30.0365339Z 2025-05-07T20:03:30.0399245Z 2025-05-07T20:03:30.0429967Z [CHECK] Total Number of symbols: 841 2025-05-07T20:03:30.0449742Z [CHECK] Number of fbgemm symbols: 0 2025-05-07T20:03:30.0467713Z + nm -gDCu ./_skbuild/linux-x86_64-3.9/cmake-build/asmjit.so > /tmp/tmp.cMnl0YBdZR.usymbols.txt 2025-05-07T20:03:30.0468351Z 2025-05-07T20:03:30.0488158Z 2025-05-07T20:03:30.0516620Z [CHECK] Listing out undefined symbols (51 total): 2025-05-07T20:03:30.0539073Z U _Unwind_Resume@GCC_3.0 2025-05-07T20:03:30.0539477Z U __cxa_begin_catch@CXXABI_1.3 2025-05-07T20:03:30.0539851Z U __cxa_guard_acquire@CXXABI_1.3 2025-05-07T20:03:30.0540181Z U __cxa_guard_release@CXXABI_1.3 2025-05-07T20:03:30.0540526Z U __errno_location@GLIBC_2.2.5 2025-05-07T20:03:30.0540859Z U __gxx_personality_v0@CXXABI_1.3 2025-05-07T20:03:30.0541197Z U abort@GLIBC_2.2.5 2025-05-07T20:03:30.0541483Z U bcmp@GLIBC_2.2.5 2025-05-07T20:03:30.0541779Z U close@GLIBC_2.2.5 2025-05-07T20:03:30.0542074Z U fputs@GLIBC_2.2.5 2025-05-07T20:03:30.0542349Z U free@GLIBC_2.2.5 2025-05-07T20:03:30.0542653Z U ftruncate64@GLIBC_2.2.5 2025-05-07T20:03:30.0542958Z U fwrite@GLIBC_2.2.5 2025-05-07T20:03:30.0543262Z U getenv@GLIBC_2.2.5 2025-05-07T20:03:30.0543553Z U getpagesize@GLIBC_2.2.5 2025-05-07T20:03:30.0543869Z U madvise@GLIBC_2.2.5 2025-05-07T20:03:30.0544337Z U malloc@GLIBC_2.2.5 2025-05-07T20:03:30.0544641Z U memcmp@GLIBC_2.2.5 2025-05-07T20:03:30.0544926Z U memcpy@GLIBC_2.14 2025-05-07T20:03:30.0545231Z U memmove@GLIBC_2.2.5 2025-05-07T20:03:30.0545533Z U memset@GLIBC_2.2.5 2025-05-07T20:03:30.0545862Z U mmap@GLIBC_2.2.5 2025-05-07T20:03:30.0546280Z U mprotect@GLIBC_2.2.5 2025-05-07T20:03:30.0546567Z U munmap@GLIBC_2.2.5 2025-05-07T20:03:30.0546863Z U open64@GLIBC_2.2.5 2025-05-07T20:03:30.0547280Z U operator delete(void*)@GLIBCXX_3.4 2025-05-07T20:03:30.0547662Z U pthread_mutex_destroy@GLIBC_2.2.5 2025-05-07T20:03:30.0547976Z U pthread_mutex_lock@GLIBC_2.2.5 2025-05-07T20:03:30.0548301Z U pthread_mutex_unlock@GLIBC_2.2.5 2025-05-07T20:03:30.0548616Z U read@GLIBC_2.2.5 2025-05-07T20:03:30.0548879Z U realloc@GLIBC_2.2.5 2025-05-07T20:03:30.0549163Z U shm_open@GLIBC_2.2.5 2025-05-07T20:03:30.0549435Z U shm_unlink@GLIBC_2.2.5 2025-05-07T20:03:30.0549729Z U snprintf@GLIBC_2.2.5 2025-05-07T20:03:30.0550024Z U std::terminate()@GLIBCXX_3.4 2025-05-07T20:03:30.0550323Z U stderr@GLIBC_2.2.5 2025-05-07T20:03:30.0550581Z U strcmp@GLIBC_2.2.5 2025-05-07T20:03:30.0550852Z U strlen@GLIBC_2.2.5 2025-05-07T20:03:30.0551132Z U strtol@GLIBC_2.2.5 2025-05-07T20:03:30.0551396Z U syscall@GLIBC_2.2.5 2025-05-07T20:03:30.0551676Z U sysconf@GLIBC_2.2.5 2025-05-07T20:03:30.0551941Z U uname@GLIBC_2.2.5 2025-05-07T20:03:30.0552216Z U unlink@GLIBC_2.2.5 2025-05-07T20:03:30.0552483Z U vsnprintf@GLIBC_2.2.5 2025-05-07T20:03:30.0552826Z U vtable for __cxxabiv1::__class_type_info@CXXABI_1.3 2025-05-07T20:03:30.0553226Z U vtable for __cxxabiv1::__si_class_type_info@CXXABI_1.3 2025-05-07T20:03:30.0553649Z U vtable for __cxxabiv1::__vmi_class_type_info@CXXABI_1.3 2025-05-07T20:03:30.0554073Z w _ITM_deregisterTMCloneTable 2025-05-07T20:03:30.0554382Z w _ITM_registerTMCloneTable 2025-05-07T20:03:30.0554691Z w __cxa_finalize@GLIBC_2.2.5 2025-05-07T20:03:30.0554969Z w __gmon_start__ 2025-05-07T20:03:30.0555299Z [CHECK] Listing out external shared libraries linked: 2025-05-07T20:03:30.0555672Z + ldd ./_skbuild/linux-x86_64-3.9/cmake-build/asmjit.so 2025-05-07T20:03:30.0555926Z 2025-05-07T20:03:30.0584827Z linux-vdso.so.1 (0x00007ffc381ab000) 2025-05-07T20:03:30.0585179Z libtorch_cpu.so => not found 2025-05-07T20:03:30.0585474Z libtorch_cuda.so => not found 2025-05-07T20:03:30.0585762Z libtorch.so => not found 2025-05-07T20:03:30.0586092Z libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fd72b58d000) 2025-05-07T20:03:30.0586539Z libgomp.so.1 => /lib64/libgomp.so.1 (0x00007fd72b537000) 2025-05-07T20:03:30.0586934Z librt.so.1 => /lib64/librt.so.1 (0x00007fd72b530000) 2025-05-07T20:03:30.0587341Z libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fd72b502000) 2025-05-07T20:03:30.0587789Z libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fd72b4fd000) 2025-05-07T20:03:30.0588203Z libc.so.6 => /lib64/libc.so.6 (0x00007fd72b2f5000) 2025-05-07T20:03:30.0588580Z libm.so.6 => /lib64/libm.so.6 (0x00007fd72b21a000) 2025-05-07T20:03:30.0588943Z /lib64/ld-linux-x86-64.so.2 (0x00007fd72b86d000) 2025-05-07T20:03:30.0589197Z 2025-05-07T20:03:30.0589305Z [CHECK] Displaying ELF information: 2025-05-07T20:03:30.0589684Z + readelf -d ./_skbuild/linux-x86_64-3.9/cmake-build/asmjit.so 2025-05-07T20:03:30.0589981Z 2025-05-07T20:03:30.0622726Z 2025-05-07T20:03:30.0623032Z Dynamic section at offset 0x74dd0 contains 35 entries: 2025-05-07T20:03:30.0624365Z Tag Type Name/Value 2025-05-07T20:03:30.0625123Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cpu.so] 2025-05-07T20:03:30.0625722Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cuda.so] 2025-05-07T20:03:30.0626299Z 0x0000000000000001 (NEEDED) Shared library: [libtorch.so] 2025-05-07T20:03:30.0626912Z 0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6] 2025-05-07T20:03:30.0627480Z 0x0000000000000001 (NEEDED) Shared library: [libgomp.so.1] 2025-05-07T20:03:30.0628005Z 0x0000000000000001 (NEEDED) Shared library: [librt.so.1] 2025-05-07T20:03:30.0628545Z 0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1] 2025-05-07T20:03:30.0629149Z 0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0] 2025-05-07T20:03:30.0629706Z 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 2025-05-07T20:03:30.0630241Z 0x000000000000000e (SONAME) Library soname: [asmjit.so] 2025-05-07T20:03:30.0630678Z 0x000000000000000c (INIT) 0x19000 2025-05-07T20:03:30.0631048Z 0x000000000000000d (FINI) 0x56a1c 2025-05-07T20:03:30.0631389Z 0x0000000000000019 (INIT_ARRAY) 0x74ff8 2025-05-07T20:03:30.0631778Z 0x000000000000001b (INIT_ARRAYSZ) 8 (bytes) 2025-05-07T20:03:30.0632138Z 0x000000000000001a (FINI_ARRAY) 0x75000 2025-05-07T20:03:30.0632519Z 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) 2025-05-07T20:03:30.0632898Z 0x000000006ffffef5 (GNU_HASH) 0x200 2025-05-07T20:03:30.0633242Z 0x0000000000000005 (STRTAB) 0x7120 2025-05-07T20:03:30.0633603Z 0x0000000000000006 (SYMTAB) 0x2230 2025-05-07T20:03:30.0633975Z 0x000000000000000a (STRSZ) 48789 (bytes) 2025-05-07T20:03:30.0634376Z 0x000000000000000b (SYMENT) 24 (bytes) 2025-05-07T20:03:30.0634737Z 0x0000000000000003 (PLTGOT) 0x76050 2025-05-07T20:03:30.0635138Z 0x0000000000000002 (PLTRELSZ) 8472 (bytes) 2025-05-07T20:03:30.0635507Z 0x0000000000000014 (PLTREL) RELA 2025-05-07T20:03:30.0635872Z 0x0000000000000017 (JMPREL) 0x16a58 2025-05-07T20:03:30.0636244Z 0x0000000000000007 (RELA) 0x13710 2025-05-07T20:03:30.0636687Z 0x0000000000000008 (RELASZ) 13128 (bytes) 2025-05-07T20:03:30.0637093Z 0x0000000000000009 (RELAENT) 24 (bytes) 2025-05-07T20:03:30.0637543Z 0x0000000000000018 (BIND_NOW) 2025-05-07T20:03:30.0637910Z 0x000000006ffffffb (FLAGS_1) Flags: NOW 2025-05-07T20:03:30.0638274Z 0x000000006ffffffe (VERNEED) 0x13650 2025-05-07T20:03:30.0638638Z 0x000000006fffffff (VERNEEDNUM) 5 2025-05-07T20:03:30.0638975Z 0x000000006ffffff0 (VERSYM) 0x12fb6 2025-05-07T20:03:30.0639378Z 0x000000006ffffff9 (RELACOUNT) 3 2025-05-07T20:03:30.0639727Z 0x0000000000000000 (NULL) 0x0 2025-05-07T20:03:30.0639951Z 2025-05-07T20:03:30.0640076Z ################################################################################ 2025-05-07T20:03:30.0640344Z 2025-05-07T20:03:30.0640348Z 2025-05-07T20:03:30.0640470Z ################################################################################ 2025-05-07T20:03:30.0640951Z [CHECK] BUILT LIBRARY: ./_skbuild/linux-x86_64-3.9/cmake-build/fbgemm.so 2025-05-07T20:03:30.0641379Z [CHECK] Listing out library size: 2025-05-07T20:03:30.0641821Z + du -h --block-size=1M ./_skbuild/linux-x86_64-3.9/cmake-build/fbgemm.so 2025-05-07T20:03:30.0642138Z 2025-05-07T20:03:30.0642288Z 6 ./_skbuild/linux-x86_64-3.9/cmake-build/fbgemm.so 2025-05-07T20:03:30.0642551Z 2025-05-07T20:03:30.0642882Z [CHECK] Listing out the GLIBC versions referenced by: ./_skbuild/linux-x86_64-3.9/cmake-build/fbgemm.so 2025-05-07T20:03:30.0643770Z + objdump -TC ./_skbuild/linux-x86_64-3.9/cmake-build/fbgemm.so | grep GLIBC_ | sed 's/.*GLIBC_\([.0-9]*\).*/GLIBC_\1/g' | sort -Vu | cat 2025-05-07T20:03:30.0644303Z 2025-05-07T20:03:30.0917573Z GLIBC_2.2.5 2025-05-07T20:03:30.0918711Z GLIBC_2.3 2025-05-07T20:03:30.0919249Z GLIBC_2.14 2025-05-07T20:03:30.0919589Z 2025-05-07T20:03:30.0919631Z 2025-05-07T20:03:30.0920657Z [CHECK] Listing out the GLIBCXX versions referenced by: ./_skbuild/linux-x86_64-3.9/cmake-build/fbgemm.so 2025-05-07T20:03:30.0923497Z + objdump -TC ./_skbuild/linux-x86_64-3.9/cmake-build/fbgemm.so | grep GLIBCXX_ | sed 's/.*GLIBCXX_\([.0-9]*\).*/GLIBCXX_\1/g' | sort -Vu | cat 2025-05-07T20:03:30.0924188Z 2025-05-07T20:03:30.1186451Z GLIBCXX_3.4 2025-05-07T20:03:30.1187100Z GLIBCXX_3.4.9 2025-05-07T20:03:30.1187685Z GLIBCXX_3.4.11 2025-05-07T20:03:30.1188293Z GLIBCXX_3.4.14 2025-05-07T20:03:30.1188859Z GLIBCXX_3.4.15 2025-05-07T20:03:30.1189458Z GLIBCXX_3.4.18 2025-05-07T20:03:30.1190333Z GLIBCXX_3.4.21 2025-05-07T20:03:30.1190740Z 2025-05-07T20:03:30.1190871Z 2025-05-07T20:03:30.1208840Z + nm -gDC ./_skbuild/linux-x86_64-3.9/cmake-build/fbgemm.so > /tmp/tmp.9bEu1ZIDDY.symbols.txt 2025-05-07T20:03:30.1209280Z 2025-05-07T20:03:30.1436771Z 2025-05-07T20:03:30.1470745Z [CHECK] Total Number of symbols: 4951 2025-05-07T20:03:30.1494202Z [CHECK] Number of fbgemm symbols: 3554 2025-05-07T20:03:30.1513774Z + nm -gDCu ./_skbuild/linux-x86_64-3.9/cmake-build/fbgemm.so > /tmp/tmp.yjFLpPMfxr.usymbols.txt 2025-05-07T20:03:30.1515104Z 2025-05-07T20:03:30.1541117Z 2025-05-07T20:03:30.1570058Z [CHECK] Listing out undefined symbols (133 total): 2025-05-07T20:03:30.1585201Z U _Unwind_Resume@GCC_3.0 2025-05-07T20:03:30.1586229Z U __cxa_allocate_exception@CXXABI_1.3 2025-05-07T20:03:30.1587300Z U __cxa_atexit@GLIBC_2.2.5 2025-05-07T20:03:30.1588207Z U __cxa_begin_catch@CXXABI_1.3 2025-05-07T20:03:30.1589125Z U __cxa_end_catch@CXXABI_1.3 2025-05-07T20:03:30.1590037Z U __cxa_free_exception@CXXABI_1.3 2025-05-07T20:03:30.1590981Z U __cxa_guard_abort@CXXABI_1.3 2025-05-07T20:03:30.1591873Z U __cxa_guard_acquire@CXXABI_1.3 2025-05-07T20:03:30.1592209Z U __cxa_guard_release@CXXABI_1.3 2025-05-07T20:03:30.1592556Z U __cxa_init_primary_exception@CXXABI_1.3.11 2025-05-07T20:03:30.1592911Z U __cxa_rethrow@CXXABI_1.3 2025-05-07T20:03:30.1593372Z U __cxa_thread_atexit@CXXABI_1.3.7 2025-05-07T20:03:30.1593818Z U __cxa_throw@CXXABI_1.3 2025-05-07T20:03:30.1594128Z U __extendhfsf2@GCC_12.0.0 2025-05-07T20:03:30.1594437Z U __gxx_personality_v0@CXXABI_1.3 2025-05-07T20:03:30.1594760Z U __once_proxy@GLIBCXX_3.4.11 2025-05-07T20:03:30.1595056Z U __tls_get_addr@GLIBC_2.3 2025-05-07T20:03:30.1595354Z U __truncsfhf2@GCC_12.0.0 2025-05-07T20:03:30.1595643Z U abort@GLIBC_2.2.5 2025-05-07T20:03:30.1596126Z U asmjit::_abi_1_13::BaseEmitter::_emitI(unsigned int, asmjit::_abi_1_13::Operand_ const&) 2025-05-07T20:03:30.1596990Z U asmjit::_abi_1_13::BaseEmitter::_emitI(unsigned int, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&) 2025-05-07T20:03:30.1597913Z U asmjit::_abi_1_13::BaseEmitter::_emitI(unsigned int, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&) 2025-05-07T20:03:30.1599051Z U asmjit::_abi_1_13::BaseEmitter::_emitI(unsigned int, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&) 2025-05-07T20:03:30.1600210Z U asmjit::_abi_1_13::BaseEmitter::emitArgsAssignment(asmjit::_abi_1_13::FuncFrame const&, asmjit::_abi_1_13::FuncArgsAssignment const&) 2025-05-07T20:03:30.1600947Z U asmjit::_abi_1_13::BaseEmitter::emitEpilog(asmjit::_abi_1_13::FuncFrame const&) 2025-05-07T20:03:30.1601511Z U asmjit::_abi_1_13::BaseEmitter::emitProlog(asmjit::_abi_1_13::FuncFrame const&) 2025-05-07T20:03:30.1602095Z U asmjit::_abi_1_13::CodeHolder::CodeHolder(asmjit::_abi_1_13::Support::Temporary const*) 2025-05-07T20:03:30.1602767Z U asmjit::_abi_1_13::CodeHolder::init(asmjit::_abi_1_13::Environment const&, unsigned long) 2025-05-07T20:03:30.1603266Z U asmjit::_abi_1_13::CodeHolder::~CodeHolder() 2025-05-07T20:03:30.1603837Z U asmjit::_abi_1_13::FuncArgsAssignment::updateFuncFrame(asmjit::_abi_1_13::FuncFrame&) const 2025-05-07T20:03:30.1604559Z U asmjit::_abi_1_13::FuncDetail::init(asmjit::_abi_1_13::FuncSignature const&, asmjit::_abi_1_13::Environment const&) 2025-05-07T20:03:30.1605135Z U asmjit::_abi_1_13::FuncFrame::finalize() 2025-05-07T20:03:30.1605613Z U asmjit::_abi_1_13::FuncFrame::init(asmjit::_abi_1_13::FuncDetail const&) 2025-05-07T20:03:30.1606217Z U asmjit::_abi_1_13::JitRuntime::JitRuntime(asmjit::_abi_1_13::JitAllocator::CreateParams const*) 2025-05-07T20:03:30.1606737Z U asmjit::_abi_1_13::JitRuntime::~JitRuntime() 2025-05-07T20:03:30.1607196Z U asmjit::_abi_1_13::x86::Assembler::Assembler(asmjit::_abi_1_13::CodeHolder*) 2025-05-07T20:03:30.1607657Z U asmjit::_abi_1_13::x86::Assembler::~Assembler() 2025-05-07T20:03:30.1607981Z U bcmp@GLIBC_2.2.5 2025-05-07T20:03:30.1608250Z U ceilf@GLIBC_2.2.5 2025-05-07T20:03:30.1608511Z U cpuinfo_get_packages 2025-05-07T20:03:30.1608804Z U cpuinfo_get_packages_count 2025-05-07T20:03:30.1609085Z U cpuinfo_initialize 2025-05-07T20:03:30.1609361Z U cpuinfo_isa 2025-05-07T20:03:30.1609609Z U floor@GLIBC_2.2.5 2025-05-07T20:03:30.1609863Z U fma@GLIBC_2.2.5 2025-05-07T20:03:30.1610124Z U fmaf@GLIBC_2.2.5 2025-05-07T20:03:30.1610375Z U free@GLIBC_2.2.5 2025-05-07T20:03:30.1610639Z U fwrite@GLIBC_2.2.5 2025-05-07T20:03:30.1610901Z U getenv@GLIBC_2.2.5 2025-05-07T20:03:30.1611169Z U ldexp@GLIBC_2.2.5 2025-05-07T20:03:30.1611420Z U log2@GLIBC_2.2.5 2025-05-07T20:03:30.1611683Z U log2f@GLIBC_2.2.5 2025-05-07T20:03:30.1611975Z U lrintf@GLIBC_2.2.5 2025-05-07T20:03:30.1612250Z U memcpy@GLIBC_2.14 2025-05-07T20:03:30.1612523Z U memmove@GLIBC_2.2.5 2025-05-07T20:03:30.1612787Z U memset@GLIBC_2.2.5 2025-05-07T20:03:30.1613063Z U nearbyint@GLIBC_2.2.5 2025-05-07T20:03:30.1613342Z U nearbyintf@GLIBC_2.2.5 2025-05-07T20:03:30.1613656Z U operator delete(void*)@GLIBCXX_3.4 2025-05-07T20:03:30.1613972Z U operator delete[](void*)@GLIBCXX_3.4 2025-05-07T20:03:30.1614307Z U operator new(unsigned long)@GLIBCXX_3.4 2025-05-07T20:03:30.1614638Z U operator new[](unsigned long)@GLIBCXX_3.4 2025-05-07T20:03:30.1614972Z U posix_memalign@GLIBC_2.2.5 2025-05-07T20:03:30.1615264Z U sqrtf@GLIBC_2.2.5 2025-05-07T20:03:30.1615627Z U std::_Hash_bytes(void const*, unsigned long, unsigned long)@CXXABI_1.3.5 2025-05-07T20:03:30.1616108Z U std::_Rb_tree_decrement(std::_Rb_tree_node_base*)@GLIBCXX_3.4 2025-05-07T20:03:30.1616529Z U std::_Rb_tree_increment(std::_Rb_tree_node_base*)@GLIBCXX_3.4 2025-05-07T20:03:30.1617265Z U std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)@GLIBCXX_3.4 2025-05-07T20:03:30.1618205Z U std::__atomic_futex_unsigned_base::_M_futex_notify_all(unsigned int*)@GLIBCXX_3.4.21 2025-05-07T20:03:30.1619263Z U std::__atomic_futex_unsigned_base::_M_futex_wait_until(unsigned int*, unsigned int, bool, std::chrono::duration >, std::chrono::duration >)@GLIBCXX_3.4.21 2025-05-07T20:03:30.1620613Z U std::__detail::_Prime_rehash_policy::_M_need_rehash(unsigned long, unsigned long, unsigned long) const@GLIBCXX_3.4.18 2025-05-07T20:03:30.1621380Z U std::__detail::_Prime_rehash_policy::_M_next_bkt(unsigned long) const@GLIBCXX_3.4.18 2025-05-07T20:03:30.1621928Z U std::__exception_ptr::exception_ptr::_M_addref() 2025-05-07T20:03:30.1622340Z U std::__exception_ptr::exception_ptr::_M_release() 2025-05-07T20:03:30.1622826Z U std::__exception_ptr::exception_ptr::exception_ptr(void*)@CXXABI_1.3.11 2025-05-07T20:03:30.1623340Z U std::__future_base::_Result_base::_Result_base()@GLIBCXX_3.4.15 2025-05-07T20:03:30.1623863Z U std::__future_base::_Result_base::~_Result_base()@GLIBCXX_3.4.15 2025-05-07T20:03:30.1624270Z U std::__once_call@GLIBCXX_3.4.11 2025-05-07T20:03:30.1634727Z U std::__once_callable@GLIBCXX_3.4.11 2025-05-07T20:03:30.1635405Z U std::__throw_bad_alloc()@GLIBCXX_3.4 2025-05-07T20:03:30.1635779Z U std::__throw_bad_array_new_length() 2025-05-07T20:03:30.1636169Z U std::__throw_bad_cast()@GLIBCXX_3.4 2025-05-07T20:03:30.1636554Z U std::__throw_bad_function_call()@GLIBCXX_3.4.14 2025-05-07T20:03:30.1636973Z U std::__throw_future_error(int)@GLIBCXX_3.4.14 2025-05-07T20:03:30.1637379Z U std::__throw_length_error(char const*)@GLIBCXX_3.4 2025-05-07T20:03:30.1637807Z U std::__throw_system_error(int)@GLIBCXX_3.4.11 2025-05-07T20:03:30.1638209Z U std::bad_alloc::~bad_alloc()@GLIBCXX_3.4 2025-05-07T20:03:30.1639067Z U std::basic_ostream >& std::__ostream_insert >(std::basic_ostream >&, char const*, long)@GLIBCXX_3.4.9 2025-05-07T20:03:30.1639914Z U std::cerr@GLIBCXX_3.4 2025-05-07T20:03:30.1640229Z U std::cout@GLIBCXX_3.4 2025-05-07T20:03:30.1640625Z U std::ctype::_M_widen_init() const@GLIBCXX_3.4.11 2025-05-07T20:03:30.1641060Z U std::future_category()@GLIBCXX_3.4.15 2025-05-07T20:03:30.1641549Z U std::future_error::~future_error()@GLIBCXX_3.4.14 2025-05-07T20:03:30.1641977Z U std::ios_base::Init::Init()@GLIBCXX_3.4 2025-05-07T20:03:30.1642349Z U std::ios_base::Init::~Init()@GLIBCXX_3.4 2025-05-07T20:03:30.1643056Z U std::logic_error::logic_error(std::__cxx11::basic_string, std::allocator > const&)@GLIBCXX_3.4.21 2025-05-07T20:03:30.1643838Z U std::logic_error::logic_error(std::logic_error const&)@GLIBCXX_3.4.21 2025-05-07T20:03:30.1644389Z U std::ostream& std::ostream::_M_insert(double)@GLIBCXX_3.4.9 2025-05-07T20:03:30.1644932Z U std::ostream& std::ostream::_M_insert(long)@GLIBCXX_3.4.9 2025-05-07T20:03:30.1645493Z U std::ostream& std::ostream::_M_insert(unsigned long)@GLIBCXX_3.4.9 2025-05-07T20:03:30.1646013Z U std::ostream::flush()@GLIBCXX_3.4 2025-05-07T20:03:30.1646409Z U std::ostream::operator<<(int)@GLIBCXX_3.4 2025-05-07T20:03:30.1646786Z U std::ostream::put(char)@GLIBCXX_3.4 2025-05-07T20:03:30.1647277Z U std::rethrow_exception(std::__exception_ptr::exception_ptr)@CXXABI_1.3.3 2025-05-07T20:03:30.1647922Z U std::runtime_error::runtime_error(char const*)@GLIBCXX_3.4.21 2025-05-07T20:03:30.1648388Z U std::runtime_error::~runtime_error()@GLIBCXX_3.4 2025-05-07T20:03:30.1648783Z U std::terminate()@GLIBCXX_3.4 2025-05-07T20:03:30.1649104Z U stderr@GLIBC_2.2.5 2025-05-07T20:03:30.1649426Z U strcmp@GLIBC_2.2.5 2025-05-07T20:03:30.1649722Z U strlen@GLIBC_2.2.5 2025-05-07T20:03:30.1650179Z U strstr@GLIBC_2.2.5 2025-05-07T20:03:30.1650464Z U tolower@GLIBC_2.2.5 2025-05-07T20:03:30.1650776Z U toupper@GLIBC_2.2.5 2025-05-07T20:03:30.1651146Z U typeinfo for std::__future_base::_Result_base@GLIBCXX_3.4.15 2025-05-07T20:03:30.1651623Z U typeinfo for std::bad_alloc@GLIBCXX_3.4 2025-05-07T20:03:30.1652019Z U typeinfo for std::future_error@GLIBCXX_3.4.14 2025-05-07T20:03:30.1652396Z U typeinfo for std::runtime_error@GLIBCXX_3.4 2025-05-07T20:03:30.1652809Z U vtable for __cxxabiv1::__class_type_info@CXXABI_1.3 2025-05-07T20:03:30.1653222Z U vtable for __cxxabiv1::__si_class_type_info@CXXABI_1.3 2025-05-07T20:03:30.1653666Z U vtable for std::bad_alloc@GLIBCXX_3.4 2025-05-07T20:03:30.1654044Z U vtable for std::future_error@GLIBCXX_3.4.14 2025-05-07T20:03:30.1654396Z w _ITM_deregisterTMCloneTable 2025-05-07T20:03:30.1654743Z w _ITM_registerTMCloneTable 2025-05-07T20:03:30.1655053Z w __cxa_finalize@GLIBC_2.2.5 2025-05-07T20:03:30.1655384Z w __gmon_start__ 2025-05-07T20:03:30.1655667Z w __pthread_key_create 2025-05-07T20:03:30.1656002Z w pthread_mutex_lock@GLIBC_2.2.5 2025-05-07T20:03:30.1656325Z w pthread_mutex_unlock@GLIBC_2.2.5 2025-05-07T20:03:30.1656664Z w pthread_once 2025-05-07T20:03:30.1657046Z w pthread_rwlock_rdlock 2025-05-07T20:03:30.1657354Z w pthread_rwlock_unlock 2025-05-07T20:03:30.1657866Z w pthread_rwlock_wrlock 2025-05-07T20:03:30.1658223Z w pthread_self@GLIBC_2.2.5 2025-05-07T20:03:30.1658632Z [CHECK] Listing out external shared libraries linked: 2025-05-07T20:03:30.1659055Z + ldd ./_skbuild/linux-x86_64-3.9/cmake-build/fbgemm.so 2025-05-07T20:03:30.1659343Z 2025-05-07T20:03:30.1659491Z linux-vdso.so.1 (0x00007ffe216b9000) 2025-05-07T20:03:30.1659822Z libc10.so => not found 2025-05-07T20:03:30.1660363Z asmjit.so => /__w/FBGEMM/FBGEMM/fbgemm_gpu/./_skbuild/linux-x86_64-3.9/cmake-build/asmjit.so (0x00007f14a5adc000) 2025-05-07T20:03:30.1661018Z libtorch.so => not found 2025-05-07T20:03:30.1661298Z libtorch_cpu.so => not found 2025-05-07T20:03:30.1661613Z libtorch_cuda.so => not found 2025-05-07T20:03:30.1661970Z libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f14a519c000) 2025-05-07T20:03:30.1662419Z libm.so.6 => /lib64/libm.so.6 (0x00007f14a59ff000) 2025-05-07T20:03:30.1662819Z libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f14a59d1000) 2025-05-07T20:03:30.1663247Z libc.so.6 => /lib64/libc.so.6 (0x00007f14a4f94000) 2025-05-07T20:03:30.1663659Z /lib64/ld-linux-x86-64.so.2 (0x00007f14a5b58000) 2025-05-07T20:03:30.1664009Z libtorch_cpu.so => not found 2025-05-07T20:03:30.1664329Z libtorch_cuda.so => not found 2025-05-07T20:03:30.1664614Z libtorch.so => not found 2025-05-07T20:03:30.1664972Z libgomp.so.1 => /lib64/libgomp.so.1 (0x00007f14a5979000) 2025-05-07T20:03:30.1665382Z librt.so.1 => /lib64/librt.so.1 (0x00007f14a5974000) 2025-05-07T20:03:30.1665842Z libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f14a596f000) 2025-05-07T20:03:30.1666143Z 2025-05-07T20:03:30.1666294Z [CHECK] Displaying ELF information: 2025-05-07T20:03:30.1666685Z + readelf -d ./_skbuild/linux-x86_64-3.9/cmake-build/fbgemm.so 2025-05-07T20:03:30.1666975Z 2025-05-07T20:03:30.1681261Z 2025-05-07T20:03:30.1681766Z Dynamic section at offset 0x54b548 contains 37 entries: 2025-05-07T20:03:30.1682259Z Tag Type Name/Value 2025-05-07T20:03:30.1682694Z 0x0000000000000001 (NEEDED) Shared library: [libc10.so] 2025-05-07T20:03:30.1683240Z 0x0000000000000001 (NEEDED) Shared library: [asmjit.so] 2025-05-07T20:03:30.1683746Z 0x0000000000000001 (NEEDED) Shared library: [libtorch.so] 2025-05-07T20:03:30.1684294Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cpu.so] 2025-05-07T20:03:30.1684965Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cuda.so] 2025-05-07T20:03:30.1685522Z 0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6] 2025-05-07T20:03:30.1686067Z 0x0000000000000001 (NEEDED) Shared library: [libm.so.6] 2025-05-07T20:03:30.1686626Z 0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1] 2025-05-07T20:03:30.1687165Z 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 2025-05-07T20:03:30.1687693Z 0x0000000000000001 (NEEDED) Shared library: [ld-linux-x86-64.so.2] 2025-05-07T20:03:30.1688336Z 0x000000000000000e (SONAME) Library soname: [fbgemm.so] 2025-05-07T20:03:30.1688854Z 0x000000000000000f (RPATH) Library rpath: [$ORIGIN] 2025-05-07T20:03:30.1689276Z 0x000000000000000c (INIT) 0xfd000 2025-05-07T20:03:30.1689643Z 0x000000000000000d (FINI) 0x4bfc58 2025-05-07T20:03:30.1689982Z 0x0000000000000019 (INIT_ARRAY) 0x548040 2025-05-07T20:03:30.1690336Z 0x000000000000001b (INIT_ARRAYSZ) 1224 (bytes) 2025-05-07T20:03:30.1690662Z 0x000000000000001a (FINI_ARRAY) 0x548508 2025-05-07T20:03:30.1690998Z 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) 2025-05-07T20:03:30.1691312Z 0x000000006ffffef5 (GNU_HASH) 0x238 2025-05-07T20:03:30.1691631Z 0x0000000000000005 (STRTAB) 0x24d98 2025-05-07T20:03:30.1691935Z 0x0000000000000006 (SYMTAB) 0x7d58 2025-05-07T20:03:30.1692262Z 0x000000000000000a (STRSZ) 754228 (bytes) 2025-05-07T20:03:30.1692612Z 0x000000000000000b (SYMENT) 24 (bytes) 2025-05-07T20:03:30.1692929Z 0x0000000000000003 (PLTGOT) 0x54b7d8 2025-05-07T20:03:30.1693268Z 0x0000000000000002 (PLTRELSZ) 25992 (bytes) 2025-05-07T20:03:30.1693583Z 0x0000000000000014 (PLTREL) RELA 2025-05-07T20:03:30.1693888Z 0x0000000000000017 (JMPREL) 0xf6410 2025-05-07T20:03:30.1694201Z 0x0000000000000007 (RELA) 0xdf7f0 2025-05-07T20:03:30.1694516Z 0x0000000000000008 (RELASZ) 93216 (bytes) 2025-05-07T20:03:30.1694855Z 0x0000000000000009 (RELAENT) 24 (bytes) 2025-05-07T20:03:30.1695369Z 0x0000000000000018 (BIND_NOW) 2025-05-07T20:03:30.1695693Z 0x000000006ffffffb (FLAGS_1) Flags: NOW 2025-05-07T20:03:30.1696029Z 0x000000006ffffffe (VERNEED) 0xdf680 2025-05-07T20:03:30.1696362Z 0x000000006fffffff (VERNEEDNUM) 5 2025-05-07T20:03:30.1696673Z 0x000000006ffffff0 (VERSYM) 0xdcfcc 2025-05-07T20:03:30.1697106Z 0x000000006ffffff9 (RELACOUNT) 155 2025-05-07T20:03:30.1697579Z 0x0000000000000000 (NULL) 0x0 2025-05-07T20:03:30.1697796Z 2025-05-07T20:03:30.1697910Z ################################################################################ 2025-05-07T20:03:30.1698154Z 2025-05-07T20:03:30.1698158Z 2025-05-07T20:03:30.1698368Z [CHECK] Verifying sample subset of symbols in the built libraries ... 2025-05-07T20:03:30.1999967Z [CHECK] Found symbol in ./_skbuild/linux-x86_64-3.9/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so: fbgemm_gpu::per_tensor_quantize_i8 2025-05-07T20:03:30.2001365Z ################################################################################ 2025-05-07T20:03:30.2001909Z [BUILD] Wheel Audit: dist/fbgemm_gpu_genai_nightly-2025.5.7-cp39-cp39-manylinux_2_28_x86_64.whl 2025-05-07T20:03:30.2002319Z 2025-05-07T20:03:30.2008392Z + conda run --no-capture-output -n build_binary auditwheel show dist/fbgemm_gpu_genai_nightly-2025.5.7-cp39-cp39-manylinux_2_28_x86_64.whl 2025-05-07T20:03:30.2008982Z 2025-05-07T20:03:33.9039675Z 2025-05-07T20:03:33.9040436Z fbgemm_gpu_genai_nightly-2025.5.7-cp39-cp39-manylinux_2_28_x86_64.whl 2025-05-07T20:03:33.9041051Z is consistent with the following platform tag: "linux_x86_64". 2025-05-07T20:03:33.9041360Z 2025-05-07T20:03:33.9041530Z The wheel references external versioned symbols in these 2025-05-07T20:03:33.9042197Z system-provided shared libraries: librt.so.1 with versions 2025-05-07T20:03:33.9042633Z {'GLIBC_2.2.5'}, libgcc_s.so.1 with versions {'GCC_12.0.0', 2025-05-07T20:03:33.9043049Z 'GCC_3.0'}, libstdc++.so.6 with versions {'CXXABI_1.3', 2025-05-07T20:03:33.9043467Z 'CXXABI_1.3.7', 'CXXABI_1.3.11', 'GLIBCXX_3.4.21', 'GLIBCXX_3.4', 2025-05-07T20:03:33.9044049Z 'GLIBCXX_3.4.14', 'CXXABI_1.3.3', 'GLIBCXX_3.4.29', 'GLIBCXX_3.4.9', 2025-05-07T20:03:33.9044539Z 'CXXABI_1.3.5', 'GLIBCXX_3.4.15', 'GLIBCXX_3.4.18', 'GLIBCXX_3.4.11'}, 2025-05-07T20:03:33.9045008Z libc.so.6 with versions {'GLIBC_2.17', 'GLIBC_2.6', 'GLIBC_2.2.5', 2025-05-07T20:03:33.9045445Z 'GLIBC_2.14', 'GLIBC_2.3', 'GLIBC_2.3.2', 'GLIBC_2.3.3'}, 2025-05-07T20:03:33.9045946Z libpthread.so.0 with versions {'GLIBC_2.3.4', 'GLIBC_2.2.5'}, 2025-05-07T20:03:33.9046442Z libm.so.6 with versions {'GLIBC_2.2.5'}, libcudart.so.12 with versions 2025-05-07T20:03:33.9046935Z {'libcudart.so.12'}, libdl.so.2 with versions {'GLIBC_2.3.4', 2025-05-07T20:03:33.9047299Z 'GLIBC_2.2.5'} 2025-05-07T20:03:33.9047425Z 2025-05-07T20:03:33.9047650Z This constrains the platform tag to "manylinux_2_35_x86_64". In order 2025-05-07T20:03:33.9048157Z to achieve a more compatible tag, you would need to recompile a new 2025-05-07T20:03:33.9048650Z wheel from source on a system with earlier versions of these 2025-05-07T20:03:33.9049054Z libraries, such as a recent manylinux image. 2025-05-07T20:03:33.9900564Z 2025-05-07T20:03:33.9900612Z 2025-05-07T20:03:33.9901158Z ################################################################################ 2025-05-07T20:03:33.9901788Z [BUILD] Enumerating the built wheels ... 2025-05-07T20:03:33.9902278Z + ls -lth dist/fbgemm_gpu_genai_nightly-2025.5.7-cp39-cp39-manylinux_2_28_x86_64.whl 2025-05-07T20:03:33.9902690Z 2025-05-07T20:03:33.9965715Z -rw-r--r--. 1 root root 19M May 7 20:03 dist/fbgemm_gpu_genai_nightly-2025.5.7-cp39-cp39-manylinux_2_28_x86_64.whl 2025-05-07T20:03:33.9967127Z 2025-05-07T20:03:33.9967857Z [BUILD] Enumerating the wheel SHAs ... 2025-05-07T20:03:33.9973567Z + sha1sum dist/fbgemm_gpu_genai_nightly-2025.5.7-cp39-cp39-manylinux_2_28_x86_64.whl 2025-05-07T20:03:33.9974686Z 2025-05-07T20:03:34.0346190Z 94d0750d60163e549c1eb2cb2a791ec2cf9a4d41 dist/fbgemm_gpu_genai_nightly-2025.5.7-cp39-cp39-manylinux_2_28_x86_64.whl 2025-05-07T20:03:34.0347875Z 2025-05-07T20:03:34.0363364Z + sha256sum dist/fbgemm_gpu_genai_nightly-2025.5.7-cp39-cp39-manylinux_2_28_x86_64.whl 2025-05-07T20:03:34.0363902Z 2025-05-07T20:03:34.1180502Z 4ad1704987fa87cd63915598dc05a53ebebd35ab51336336eb8f0056001f042a dist/fbgemm_gpu_genai_nightly-2025.5.7-cp39-cp39-manylinux_2_28_x86_64.whl 2025-05-07T20:03:34.1181889Z 2025-05-07T20:03:34.1184828Z + md5sum dist/fbgemm_gpu_genai_nightly-2025.5.7-cp39-cp39-manylinux_2_28_x86_64.whl 2025-05-07T20:03:34.1185203Z 2025-05-07T20:03:34.1515979Z 5c45ae153a493153a2b0776bec42bc74 dist/fbgemm_gpu_genai_nightly-2025.5.7-cp39-cp39-manylinux_2_28_x86_64.whl 2025-05-07T20:03:34.1517542Z 2025-05-07T20:03:34.1517946Z [BUILD] FBGEMM-GPU build + package completed 2025-05-07T20:03:34.2908208Z ##[group]Run actions/upload-artifact@v4 2025-05-07T20:03:34.2908578Z with: 2025-05-07T20:03:34.2908874Z name: fbgemm_genai_x86_clang_py3.9_cu12.8.0.whl 2025-05-07T20:03:34.2909240Z path: fbgemm_gpu/dist/*.whl 2025-05-07T20:03:34.2909585Z if-no-files-found: error 2025-05-07T20:03:34.2909995Z compression-level: 6 2025-05-07T20:03:34.2910273Z overwrite: false 2025-05-07T20:03:34.2910562Z include-hidden-files: false 2025-05-07T20:03:34.2910832Z env: 2025-05-07T20:03:34.2911092Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T20:03:34.2911404Z BUILD_ENV: build_binary 2025-05-07T20:03:34.2911692Z BUILD_TARGET: genai 2025-05-07T20:03:34.2911947Z BUILD_VARIANT: cuda 2025-05-07T20:03:34.2912230Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T20:03:34.2912488Z ##[endgroup] 2025-05-07T20:03:34.2924756Z ##[command]/usr/bin/docker exec 684dd9e27594bd891e6be6ff1899ad1c0eb766565fb4b8e7ad6703697bf1bd21 sh -c "cat /etc/*release | grep ^ID" 2025-05-07T20:03:35.2313625Z With the provided path, there will be 1 file uploaded 2025-05-07T20:03:35.2315183Z Artifact name is valid! 2025-05-07T20:03:35.2315943Z Root directory input is valid! 2025-05-07T20:03:35.3501096Z Beginning upload of artifact content to blob storage 2025-05-07T20:03:36.2359100Z Uploaded bytes 8388608 2025-05-07T20:03:36.5372565Z Uploaded bytes 16777216 2025-05-07T20:03:36.6357312Z Uploaded bytes 18501145 2025-05-07T20:03:36.6612197Z Finished uploading artifact content to blob storage! 2025-05-07T20:03:36.6614189Z SHA256 digest of uploaded artifact zip is 49d17600359b05f780104ac5b5c7182a7fffa14a07ce833b6d20dd778f161f31 2025-05-07T20:03:36.6615919Z Finalizing artifact upload 2025-05-07T20:03:36.7359903Z Artifact fbgemm_genai_x86_clang_py3.9_cu12.8.0.whl.zip successfully finalized. Artifact ID 3081405239 2025-05-07T20:03:36.7361467Z Artifact fbgemm_genai_x86_clang_py3.9_cu12.8.0.whl has been successfully uploaded! Final size is 18501145 bytes. Artifact ID is 3081405239 2025-05-07T20:03:36.7370636Z Artifact download URL: https://github.com/pytorch/FBGEMM/actions/runs/14891846252/artifacts/3081405239 2025-05-07T20:03:36.7690870Z Post job cleanup. 2025-05-07T20:03:36.7704916Z ##[command]/usr/bin/docker exec 684dd9e27594bd891e6be6ff1899ad1c0eb766565fb4b8e7ad6703697bf1bd21 sh -c "cat /etc/*release | grep ^ID" 2025-05-07T20:03:37.0369220Z [command]/usr/bin/git version 2025-05-07T20:03:37.0633254Z git version 2.47.1 2025-05-07T20:03:37.0669271Z Copying '/github/home/.gitconfig' to '/__w/_temp/748f6004-7281-46a6-b673-42f760bf73d5/.gitconfig' 2025-05-07T20:03:37.0685872Z Temporarily overriding HOME='/__w/_temp/748f6004-7281-46a6-b673-42f760bf73d5' before making global git config changes 2025-05-07T20:03:37.0686774Z Adding repository directory to the temporary git global config as a safe directory 2025-05-07T20:03:37.0689494Z [command]/usr/bin/git config --global --add safe.directory /__w/FBGEMM/FBGEMM 2025-05-07T20:03:37.0752531Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-05-07T20:03:37.0781649Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-05-07T20:03:37.1344597Z Entering 'external/asmjit' 2025-05-07T20:03:37.1501912Z Entering 'external/composable_kernel' 2025-05-07T20:03:37.1668936Z Entering 'external/cpuinfo' 2025-05-07T20:03:37.1781746Z Entering 'external/cutlass' 2025-05-07T20:03:37.1960357Z Entering 'external/googletest' 2025-05-07T20:03:37.2071087Z Entering 'external/hipify_torch' 2025-05-07T20:03:37.2194477Z Entering 'external/json' 2025-05-07T20:03:37.2302289Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-05-07T20:03:37.2323248Z http.https://github.com/.extraheader 2025-05-07T20:03:37.2328906Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2025-05-07T20:03:37.2361948Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-05-07T20:03:37.2647483Z Entering 'external/asmjit' 2025-05-07T20:03:37.2682614Z http.https://github.com/.extraheader 2025-05-07T20:03:37.2716274Z Entering 'external/composable_kernel' 2025-05-07T20:03:37.2747556Z http.https://github.com/.extraheader 2025-05-07T20:03:37.2796415Z Entering 'external/cpuinfo' 2025-05-07T20:03:37.2845531Z http.https://github.com/.extraheader 2025-05-07T20:03:37.2888825Z Entering 'external/cutlass' 2025-05-07T20:03:37.2921477Z http.https://github.com/.extraheader 2025-05-07T20:03:37.2965511Z Entering 'external/googletest' 2025-05-07T20:03:37.3007714Z http.https://github.com/.extraheader 2025-05-07T20:03:37.3050978Z Entering 'external/hipify_torch' 2025-05-07T20:03:37.3086032Z http.https://github.com/.extraheader 2025-05-07T20:03:37.3115678Z Entering 'external/json' 2025-05-07T20:03:37.3161804Z http.https://github.com/.extraheader 2025-05-07T20:03:37.3378218Z Stop and remove container: d27e40242d2e486782d6f283e7ffb112_amazonlinux2023_99beec 2025-05-07T20:03:37.3383551Z ##[command]/usr/bin/docker rm --force 684dd9e27594bd891e6be6ff1899ad1c0eb766565fb4b8e7ad6703697bf1bd21 2025-05-07T20:03:38.6003152Z 684dd9e27594bd891e6be6ff1899ad1c0eb766565fb4b8e7ad6703697bf1bd21 2025-05-07T20:03:38.6034872Z Remove container network: github_network_8b9c024716574df3abf522ec9f61fb6d 2025-05-07T20:03:38.6039332Z ##[command]/usr/bin/docker network rm github_network_8b9c024716574df3abf522ec9f61fb6d 2025-05-07T20:03:39.7039923Z github_network_8b9c024716574df3abf522ec9f61fb6d 2025-05-07T20:03:39.7083851Z A job completed hook has been configured by the self-hosted runner administrator 2025-05-07T20:03:39.7292607Z ##[group]Run '/home/ec2-user/runner-scripts/after_job.sh' 2025-05-07T20:03:39.7298038Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-05-07T20:03:39.7298493Z ##[endgroup] 2025-05-07T20:03:51.8484653Z Cleaning up orphan processes