2025-05-07T19:42:32.7992681Z Current runner version: '2.323.0' 2025-05-07T19:42:32.7998558Z Runner name: 'i-02bb5f24b31fe2d00' 2025-05-07T19:42:32.7999536Z Machine name: 'ip-10-0-60-10' 2025-05-07T19:42:32.8002309Z ##[group]GITHUB_TOKEN Permissions 2025-05-07T19:42:32.8004367Z Contents: read 2025-05-07T19:42:32.8005015Z Metadata: read 2025-05-07T19:42:32.8005497Z Packages: read 2025-05-07T19:42:32.8006072Z ##[endgroup] 2025-05-07T19:42:32.8008092Z Secret source: None 2025-05-07T19:42:32.8008753Z Prepare workflow directory 2025-05-07T19:42:32.8615812Z Prepare all required actions 2025-05-07T19:42:32.8655807Z Getting action download info 2025-05-07T19:42:33.0372245Z Download action repository 'actions/checkout@v4' (SHA:11bd71901bbe5b1630ceea73d27597364c9af683) 2025-05-07T19:42:33.3206110Z Download action repository 'actions/upload-artifact@v4' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02) 2025-05-07T19:42:33.8566977Z Complete job name: build_artifact (x86, linux.24xlarge, genai, 3.13, 12.8.0, clang) 2025-05-07T19:42:33.9460591Z A job started hook has been configured by the self-hosted runner administrator 2025-05-07T19:42:33.9594958Z ##[group]Run '/home/ec2-user/runner-scripts/before_job.sh' 2025-05-07T19:42:33.9605182Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:42:33.9606557Z ##[endgroup] 2025-05-07T19:42:35.0647761Z Runner Type: linux.24xlarge 2025-05-07T19:42:35.0648282Z Instance Type: c5.24xlarge 2025-05-07T19:42:35.0648612Z AMI Name: unknown 2025-05-07T19:42:35.0684785Z AMI ID: ami-071226ecf16aa7d96 2025-05-07T19:42:40.0726585Z ##[group]Checking docker version 2025-05-07T19:42:40.0739963Z ##[command]/usr/bin/docker version --format '{{.Server.APIVersion}}' 2025-05-07T19:42:40.0939857Z '1.44' 2025-05-07T19:42:40.0955766Z Docker daemon API version: '1.44' 2025-05-07T19:42:40.0956291Z ##[command]/usr/bin/docker version --format '{{.Client.APIVersion}}' 2025-05-07T19:42:40.1154604Z '1.44' 2025-05-07T19:42:40.1166031Z Docker client API version: '1.44' 2025-05-07T19:42:40.1170242Z ##[endgroup] 2025-05-07T19:42:40.1172724Z ##[group]Clean up resources from previous jobs 2025-05-07T19:42:40.1177799Z ##[command]/usr/bin/docker ps --all --quiet --no-trunc --filter "label=42a6b8" 2025-05-07T19:42:40.1325004Z ##[command]/usr/bin/docker network prune --force --filter "label=42a6b8" 2025-05-07T19:42:40.1463561Z ##[endgroup] 2025-05-07T19:42:40.1463888Z ##[group]Create local container network 2025-05-07T19:42:40.1473005Z ##[command]/usr/bin/docker network create --label 42a6b8 github_network_994bcd5a2cdf4c50adc703a7ba6189a2 2025-05-07T19:42:40.4000236Z ddac64f6d39d669147f18a6d3c1e00c1300cbdb5d0551fac7f1bd081c7998c58 2025-05-07T19:42:40.4023731Z ##[endgroup] 2025-05-07T19:42:40.4045077Z ##[group]Starting job container 2025-05-07T19:42:40.4064273Z ##[command]/usr/bin/docker pull amazonlinux:2023 2025-05-07T19:42:40.5243308Z 2023: Pulling from library/amazonlinux 2025-05-07T19:42:40.5362871Z Digest: sha256:cb5b4c509d62ae388f674c139ae5e8281fc160c217d474445e912043e1941988 2025-05-07T19:42:40.5363881Z Status: Image is up to date for amazonlinux:2023 2025-05-07T19:42:40.5388730Z docker.io/library/amazonlinux:2023 2025-05-07T19:42:40.5481841Z ##[command]/usr/bin/docker create --name 0703a8b60f604284bbb3ddbf668bc6f3_amazonlinux2023_76f532 --label 42a6b8 --workdir /__w/FBGEMM/FBGEMM --network github_network_994bcd5a2cdf4c50adc703a7ba6189a2 --user root -e "HOME=/github/home" -e GITHUB_ACTIONS=true -e CI=true -v "/var/run/docker.sock":"/var/run/docker.sock" -v "/home/ec2-user/actions-runner/_work":"/__w" -v "/home/ec2-user/actions-runner/externals":"/__e":ro -v "/home/ec2-user/actions-runner/_work/_temp":"/__w/_temp" -v "/home/ec2-user/actions-runner/_work/_actions":"/__w/_actions" -v "/home/ec2-user/actions-runner/_work/_tool":"/__w/_tool" -v "/home/ec2-user/actions-runner/_work/_temp/_github_home":"/github/home" -v "/home/ec2-user/actions-runner/_work/_temp/_github_workflow":"/github/workflow" --entrypoint "tail" amazonlinux:2023 "-f" "/dev/null" 2025-05-07T19:42:40.5899772Z 619e625bd71e3b5994636768304a866b1a468a4f4ab5c062f71de4323d5ef686 2025-05-07T19:42:40.5927696Z ##[command]/usr/bin/docker start 619e625bd71e3b5994636768304a866b1a468a4f4ab5c062f71de4323d5ef686 2025-05-07T19:42:41.0684940Z 619e625bd71e3b5994636768304a866b1a468a4f4ab5c062f71de4323d5ef686 2025-05-07T19:42:41.0706488Z ##[command]/usr/bin/docker ps --all --filter id=619e625bd71e3b5994636768304a866b1a468a4f4ab5c062f71de4323d5ef686 --filter status=running --no-trunc --format "{{.ID}} {{.Status}}" 2025-05-07T19:42:41.0863636Z 619e625bd71e3b5994636768304a866b1a468a4f4ab5c062f71de4323d5ef686 Up Less than a second 2025-05-07T19:42:41.0887443Z ##[command]/usr/bin/docker inspect --format "{{range .Config.Env}}{{println .}}{{end}}" 619e625bd71e3b5994636768304a866b1a468a4f4ab5c062f71de4323d5ef686 2025-05-07T19:42:41.1042022Z CI=true 2025-05-07T19:42:41.1042555Z HOME=/github/home 2025-05-07T19:42:41.1042969Z GITHUB_ACTIONS=true 2025-05-07T19:42:41.1043598Z PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-05-07T19:42:41.1061151Z ##[endgroup] 2025-05-07T19:42:41.1070898Z ##[group]Waiting for all services to be ready 2025-05-07T19:42:41.1072477Z ##[endgroup] 2025-05-07T19:42:41.1146772Z ##[group]Run yum update -y; yum install -y binutils findutils git pciutils sudo tar wget which 2025-05-07T19:42:41.1147758Z yum update -y; yum install -y binutils findutils git pciutils sudo tar wget which 2025-05-07T19:42:41.1148592Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:42:41.1149037Z env: 2025-05-07T19:42:41.1149344Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:42:41.1149696Z BUILD_ENV: build_binary 2025-05-07T19:42:41.1150073Z BUILD_TARGET: genai 2025-05-07T19:42:41.1150351Z BUILD_VARIANT: cuda 2025-05-07T19:42:41.1150675Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:42:41.1151037Z ##[endgroup] 2025-05-07T19:42:41.9591627Z Amazon Linux 2023 repository 67 MB/s | 37 MB 00:00 2025-05-07T19:42:48.5836411Z Last metadata expiration check: 0:00:07 ago on Wed May 7 19:42:41 2025. 2025-05-07T19:42:49.1437009Z Dependencies resolved. 2025-05-07T19:42:49.1617892Z Nothing to do. 2025-05-07T19:42:49.1618672Z Complete! 2025-05-07T19:42:49.4086547Z Last metadata expiration check: 0:00:08 ago on Wed May 7 19:42:41 2025. 2025-05-07T19:42:49.4724152Z Dependencies resolved. 2025-05-07T19:42:49.4944666Z ======================================================================================== 2025-05-07T19:42:49.4945341Z Package Arch Version Repository Size 2025-05-07T19:42:49.4946129Z ======================================================================================== 2025-05-07T19:42:49.4946654Z Installing: 2025-05-07T19:42:49.4947114Z binutils x86_64 2.41-50.amzn2023.0.3 amazonlinux 5.3 M 2025-05-07T19:42:49.4947715Z findutils x86_64 1:4.8.0-2.amzn2023.0.2 amazonlinux 539 k 2025-05-07T19:42:49.4948324Z git x86_64 2.47.1-1.amzn2023.0.2 amazonlinux 54 k 2025-05-07T19:42:49.4948978Z pciutils x86_64 3.7.0-3.amzn2023.0.2 amazonlinux 93 k 2025-05-07T19:42:49.4949547Z sudo x86_64 1.9.15-1.p5.amzn2023.0.1 amazonlinux 1.3 M 2025-05-07T19:42:49.4950148Z tar x86_64 2:1.34-1.amzn2023.0.4 amazonlinux 879 k 2025-05-07T19:42:49.4950729Z wget x86_64 1.21.3-1.amzn2023.0.4 amazonlinux 779 k 2025-05-07T19:42:49.4951270Z which x86_64 2.21-26.amzn2023.0.2 amazonlinux 42 k 2025-05-07T19:42:49.4951803Z Installing dependencies: 2025-05-07T19:42:49.4952247Z cracklib x86_64 2.9.6-27.amzn2023.0.2 amazonlinux 82 k 2025-05-07T19:42:49.4952882Z cyrus-sasl-lib x86_64 2.1.27-18.amzn2023.0.3 amazonlinux 786 k 2025-05-07T19:42:49.4953620Z elfutils-debuginfod-client x86_64 0.188-3.amzn2023.0.2 amazonlinux 41 k 2025-05-07T19:42:49.4954269Z git-core x86_64 2.47.1-1.amzn2023.0.2 amazonlinux 4.7 M 2025-05-07T19:42:49.4955139Z git-core-doc noarch 2.47.1-1.amzn2023.0.2 amazonlinux 2.8 M 2025-05-07T19:42:49.4955733Z gnutls x86_64 3.8.3-6.amzn2023.0.1 amazonlinux 1.1 M 2025-05-07T19:42:49.4956377Z groff-base x86_64 1.22.4-7.amzn2023.0.2 amazonlinux 1.0 M 2025-05-07T19:42:49.4956971Z gzip x86_64 1.12-1.amzn2023.0.1 amazonlinux 160 k 2025-05-07T19:42:49.4957541Z hwdata noarch 0.384-1.amzn2023.0.3 amazonlinux 1.6 M 2025-05-07T19:42:49.4958301Z jansson x86_64 2.14-0.amzn2023 amazonlinux 46 k 2025-05-07T19:42:49.4958852Z kmod-libs x86_64 29-2.amzn2023.0.5 amazonlinux 62 k 2025-05-07T19:42:49.4959464Z less x86_64 608-2.amzn2023.0.2 amazonlinux 168 k 2025-05-07T19:42:49.4960189Z libcbor x86_64 0.7.0-3.amzn2023.0.2 amazonlinux 57 k 2025-05-07T19:42:49.4960728Z libdb x86_64 5.3.28-49.amzn2023.0.2 amazonlinux 756 k 2025-05-07T19:42:49.4961356Z libeconf x86_64 0.4.0-1.amzn2023.0.3 amazonlinux 28 k 2025-05-07T19:42:49.4961927Z libedit x86_64 3.1-38.20210714cvs.amzn2023.0.2 amazonlinux 108 k 2025-05-07T19:42:49.4962628Z libfdisk x86_64 2.37.4-1.amzn2023.0.4 amazonlinux 153 k 2025-05-07T19:42:49.4963206Z libfido2 x86_64 1.10.0-2.amzn2023.0.2 amazonlinux 95 k 2025-05-07T19:42:49.4963824Z libmetalink x86_64 0.1.3-14.amzn2023.0.2 amazonlinux 31 k 2025-05-07T19:42:49.4964466Z libpwquality x86_64 1.4.4-6.amzn2023.0.2 amazonlinux 106 k 2025-05-07T19:42:49.4964997Z libsemanage x86_64 3.4-5.amzn2023.0.2 amazonlinux 121 k 2025-05-07T19:42:49.4965645Z libutempter x86_64 1.2.1-4.amzn2023.0.2 amazonlinux 26 k 2025-05-07T19:42:49.4966215Z nano x86_64 8.3-1.amzn2023 amazonlinux 706 k 2025-05-07T19:42:49.4966700Z ncurses x86_64 6.2-4.20200222.amzn2023.0.6 amazonlinux 394 k 2025-05-07T19:42:49.4967303Z nettle x86_64 3.10.1-1.amzn2023.0.1 amazonlinux 573 k 2025-05-07T19:42:49.4967828Z openldap x86_64 2.4.57-6.amzn2023.0.7 amazonlinux 256 k 2025-05-07T19:42:49.4968377Z openssh x86_64 8.7p1-8.amzn2023.0.14 amazonlinux 454 k 2025-05-07T19:42:49.4969020Z openssh-clients x86_64 8.7p1-8.amzn2023.0.14 amazonlinux 708 k 2025-05-07T19:42:49.4969567Z pam x86_64 1.5.1-8.amzn2023.0.4 amazonlinux 542 k 2025-05-07T19:42:49.4970111Z pciutils-libs x86_64 3.7.0-3.amzn2023.0.2 amazonlinux 41 k 2025-05-07T19:42:49.5069960Z perl-AutoLoader noarch 5.74-477.amzn2023.0.6 amazonlinux 22 k 2025-05-07T19:42:49.5070681Z perl-B x86_64 1.80-477.amzn2023.0.6 amazonlinux 179 k 2025-05-07T19:42:49.5071222Z perl-Carp noarch 1.50-458.amzn2023.0.2 amazonlinux 29 k 2025-05-07T19:42:49.5071816Z perl-Class-Struct noarch 0.66-477.amzn2023.0.6 amazonlinux 22 k 2025-05-07T19:42:49.5072528Z perl-Data-Dumper x86_64 2.174-460.amzn2023.0.2 amazonlinux 55 k 2025-05-07T19:42:49.5073224Z perl-Digest noarch 1.20-1.amzn2023.0.2 amazonlinux 26 k 2025-05-07T19:42:49.5073803Z perl-Digest-MD5 x86_64 2.58-2.amzn2023.0.2 amazonlinux 36 k 2025-05-07T19:42:49.5074392Z perl-DynaLoader x86_64 1.47-477.amzn2023.0.6 amazonlinux 26 k 2025-05-07T19:42:49.5075438Z perl-Encode x86_64 4:3.15-462.amzn2023.0.2 amazonlinux 1.7 M 2025-05-07T19:42:49.5075988Z perl-Errno x86_64 1.30-477.amzn2023.0.6 amazonlinux 15 k 2025-05-07T19:42:49.5076564Z perl-Error noarch 1:0.17029-5.amzn2023.0.2 amazonlinux 41 k 2025-05-07T19:42:49.5077124Z perl-Exporter noarch 5.74-459.amzn2023.0.2 amazonlinux 31 k 2025-05-07T19:42:49.5077676Z perl-Fcntl x86_64 1.13-477.amzn2023.0.6 amazonlinux 21 k 2025-05-07T19:42:49.5078253Z perl-File-Basename noarch 2.85-477.amzn2023.0.6 amazonlinux 18 k 2025-05-07T19:42:49.5078867Z perl-File-Find noarch 1.37-477.amzn2023.0.6 amazonlinux 26 k 2025-05-07T19:42:49.5079463Z perl-File-Path noarch 2.18-2.amzn2023.0.2 amazonlinux 36 k 2025-05-07T19:42:49.5080043Z perl-File-Temp noarch 1:0.231.100-2.amzn2023.0.2 amazonlinux 60 k 2025-05-07T19:42:49.5080777Z perl-File-stat noarch 1.09-477.amzn2023.0.6 amazonlinux 17 k 2025-05-07T19:42:49.5081360Z perl-FileHandle noarch 2.03-477.amzn2023.0.6 amazonlinux 16 k 2025-05-07T19:42:49.5082037Z perl-Getopt-Long noarch 1:2.52-2.amzn2023.0.2 amazonlinux 60 k 2025-05-07T19:42:49.5082671Z perl-Getopt-Std noarch 1.12-477.amzn2023.0.6 amazonlinux 16 k 2025-05-07T19:42:49.5083225Z perl-Git noarch 2.47.1-1.amzn2023.0.2 amazonlinux 42 k 2025-05-07T19:42:49.5083799Z perl-HTTP-Tiny noarch 0.078-1.amzn2023.0.3 amazonlinux 56 k 2025-05-07T19:42:49.5084368Z perl-IO x86_64 1.43-477.amzn2023.0.6 amazonlinux 87 k 2025-05-07T19:42:49.5084957Z perl-IPC-Open3 noarch 1.21-477.amzn2023.0.6 amazonlinux 23 k 2025-05-07T19:42:49.5085546Z perl-MIME-Base64 x86_64 3.16-2.amzn2023.0.2 amazonlinux 31 k 2025-05-07T19:42:49.5086122Z perl-Net-SSLeay x86_64 1.94-1.amzn2023.0.1 amazonlinux 392 k 2025-05-07T19:42:49.5086684Z perl-POSIX x86_64 1.94-477.amzn2023.0.6 amazonlinux 97 k 2025-05-07T19:42:49.5087252Z perl-PathTools x86_64 3.78-459.amzn2023.0.2 amazonlinux 85 k 2025-05-07T19:42:49.5087833Z perl-Pod-Escapes noarch 1:1.07-458.amzn2023.0.2 amazonlinux 20 k 2025-05-07T19:42:49.5088441Z perl-Pod-Perldoc noarch 3.28.01-459.amzn2023.0.3 amazonlinux 84 k 2025-05-07T19:42:49.5089034Z perl-Pod-Simple noarch 1:3.42-2.amzn2023.0.2 amazonlinux 215 k 2025-05-07T19:42:49.5089628Z perl-Pod-Usage noarch 4:2.01-2.amzn2023.0.2 amazonlinux 41 k 2025-05-07T19:42:49.5090229Z perl-Scalar-List-Utils x86_64 4:1.56-459.amzn2023.0.2 amazonlinux 71 k 2025-05-07T19:42:49.5090866Z perl-SelectSaver noarch 1.02-477.amzn2023.0.6 amazonlinux 12 k 2025-05-07T19:42:49.5091448Z perl-Socket x86_64 4:2.032-1.amzn2023.0.2 amazonlinux 55 k 2025-05-07T19:42:49.5091985Z perl-Storable x86_64 1:3.21-458.amzn2023.0.2 amazonlinux 96 k 2025-05-07T19:42:49.5092555Z perl-Symbol noarch 1.08-477.amzn2023.0.6 amazonlinux 15 k 2025-05-07T19:42:49.5093141Z perl-Term-ANSIColor noarch 5.01-459.amzn2023.0.2 amazonlinux 48 k 2025-05-07T19:42:49.5093755Z perl-Term-Cap noarch 1.17-458.amzn2023.0.2 amazonlinux 22 k 2025-05-07T19:42:49.5094344Z perl-TermReadKey x86_64 2.38-9.amzn2023.0.2 amazonlinux 36 k 2025-05-07T19:42:49.5095028Z perl-Text-ParseWords noarch 3.30-458.amzn2023.0.2 amazonlinux 17 k 2025-05-07T19:42:49.5095684Z perl-Text-Tabs+Wrap noarch 2021.0726-1.amzn2023.0.1 amazonlinux 22 k 2025-05-07T19:42:49.5096401Z perl-Time-Local noarch 2:1.300-5.amzn2023.0.2 amazonlinux 34 k 2025-05-07T19:42:49.5096987Z perl-URI noarch 5.09-1.amzn2023.0.2 amazonlinux 108 k 2025-05-07T19:42:49.5097540Z perl-base noarch 2.27-477.amzn2023.0.6 amazonlinux 17 k 2025-05-07T19:42:49.5098098Z perl-constant noarch 1.33-459.amzn2023.0.2 amazonlinux 23 k 2025-05-07T19:42:49.5098663Z perl-if noarch 0.60.800-477.amzn2023.0.6 amazonlinux 14 k 2025-05-07T19:42:49.5099217Z perl-interpreter x86_64 4:5.32.1-477.amzn2023.0.6 amazonlinux 71 k 2025-05-07T19:42:49.5099774Z perl-lib x86_64 0.65-477.amzn2023.0.6 amazonlinux 15 k 2025-05-07T19:42:49.5100294Z perl-libnet noarch 3.13-2.amzn2023.0.2 amazonlinux 126 k 2025-05-07T19:42:49.5100834Z perl-libs x86_64 4:5.32.1-477.amzn2023.0.6 amazonlinux 2.0 M 2025-05-07T19:42:49.5101410Z perl-mro x86_64 1.23-477.amzn2023.0.6 amazonlinux 29 k 2025-05-07T19:42:49.5101939Z perl-overload noarch 1.31-477.amzn2023.0.6 amazonlinux 46 k 2025-05-07T19:42:49.5102528Z perl-overloading noarch 0.02-477.amzn2023.0.6 amazonlinux 13 k 2025-05-07T19:42:49.5103094Z perl-parent noarch 1:0.238-458.amzn2023.0.2 amazonlinux 14 k 2025-05-07T19:42:49.5103661Z perl-podlators noarch 1:4.14-458.amzn2023.0.2 amazonlinux 112 k 2025-05-07T19:42:49.5104220Z perl-subs noarch 1.03-477.amzn2023.0.6 amazonlinux 12 k 2025-05-07T19:42:49.5104745Z perl-vars noarch 1.05-477.amzn2023.0.6 amazonlinux 13 k 2025-05-07T19:42:49.5105288Z shadow-utils x86_64 2:4.9-12.amzn2023.0.4 amazonlinux 1.1 M 2025-05-07T19:42:49.5105830Z systemd-libs x86_64 252.23-3.amzn2023 amazonlinux 613 k 2025-05-07T19:42:49.5106363Z util-linux x86_64 2.37.4-1.amzn2023.0.4 amazonlinux 2.2 M 2025-05-07T19:42:49.5106893Z util-linux-core x86_64 2.37.4-1.amzn2023.0.4 amazonlinux 432 k 2025-05-07T19:42:49.5107452Z Installing weak dependencies: 2025-05-07T19:42:49.5107880Z nano-default-editor noarch 8.3-1.amzn2023 amazonlinux 10 k 2025-05-07T19:42:49.5108432Z perl-IO-Socket-IP noarch 0.41-3.amzn2023.0.2 amazonlinux 42 k 2025-05-07T19:42:49.5108995Z perl-IO-Socket-SSL noarch 2.075-1.amzn2023.0.2 amazonlinux 218 k 2025-05-07T19:42:49.5109538Z perl-Mozilla-CA noarch 20200520-4.amzn2023.0.2 amazonlinux 13 k 2025-05-07T19:42:49.5110064Z perl-NDBM_File x86_64 1.15-477.amzn2023.0.6 amazonlinux 23 k 2025-05-07T19:42:49.5110601Z sudo-python-plugin x86_64 1.9.15-1.p5.amzn2023.0.1 amazonlinux 56 k 2025-05-07T19:42:49.5110931Z 2025-05-07T19:42:49.5111024Z Transaction Summary 2025-05-07T19:42:49.5111292Z ======================================================================================== 2025-05-07T19:42:49.5111594Z Install 107 Packages 2025-05-07T19:42:49.5111742Z 2025-05-07T19:42:49.5111871Z Total download size: 38 M 2025-05-07T19:42:49.5112104Z Installed size: 151 M 2025-05-07T19:42:49.5112342Z Downloading Packages: 2025-05-07T19:42:49.7953816Z (1/107): cracklib-2.9.6-27.amzn2023.0.2.x86_64. 3.6 MB/s | 82 kB 00:00 2025-05-07T19:42:49.8038576Z (2/107): elfutils-debuginfod-client-0.188-3.amz 6.1 MB/s | 41 kB 00:00 2025-05-07T19:42:49.8141272Z (3/107): cyrus-sasl-lib-2.1.27-18.amzn2023.0.3. 19 MB/s | 786 kB 00:00 2025-05-07T19:42:49.8207465Z (4/107): findutils-4.8.0-2.amzn2023.0.2.x86_64. 45 MB/s | 539 kB 00:00 2025-05-07T19:42:49.8433764Z (5/107): binutils-2.41-50.amzn2023.0.3.x86_64.r 74 MB/s | 5.3 MB 00:00 2025-05-07T19:42:49.8448355Z (6/107): git-2.47.1-1.amzn2023.0.2.x86_64.rpm 1.9 MB/s | 54 kB 00:00 2025-05-07T19:42:49.8676697Z (7/107): gnutls-3.8.3-6.amzn2023.0.1.x86_64.rpm 73 MB/s | 1.1 MB 00:00 2025-05-07T19:42:49.8913691Z (8/107): git-core-2.47.1-1.amzn2023.0.2.x86_64. 68 MB/s | 4.7 MB 00:00 2025-05-07T19:42:49.9061059Z (9/107): git-core-doc-2.47.1-1.amzn2023.0.2.noa 52 MB/s | 2.8 MB 00:00 2025-05-07T19:42:49.9126919Z (10/107): groff-base-1.22.4-7.amzn2023.0.2.x86_ 26 MB/s | 1.0 MB 00:00 2025-05-07T19:42:49.9161606Z (11/107): gzip-1.12-1.amzn2023.0.1.x86_64.rpm 6.9 MB/s | 160 kB 00:00 2025-05-07T19:42:49.9217143Z (12/107): jansson-2.14-0.amzn2023.x86_64.rpm 5.5 MB/s | 46 kB 00:00 2025-05-07T19:42:49.9316181Z (13/107): hwdata-0.384-1.amzn2023.0.3.noarch.rp 89 MB/s | 1.6 MB 00:00 2025-05-07T19:42:49.9329561Z (14/107): kmod-libs-29-2.amzn2023.0.5.x86_64.rp 3.6 MB/s | 62 kB 00:00 2025-05-07T19:42:49.9365415Z (15/107): less-608-2.amzn2023.0.2.x86_64.rpm 13 MB/s | 168 kB 00:00 2025-05-07T19:42:49.9416150Z (16/107): libcbor-0.7.0-3.amzn2023.0.2.x86_64.r 7.1 MB/s | 57 kB 00:00 2025-05-07T19:42:49.9476612Z (17/107): libdb-5.3.28-49.amzn2023.0.2.x86_64.r 54 MB/s | 756 kB 00:00 2025-05-07T19:42:49.9487924Z (18/107): libeconf-0.4.0-1.amzn2023.0.3.x86_64. 2.3 MB/s | 28 kB 00:00 2025-05-07T19:42:49.9517249Z (19/107): libedit-3.1-38.20210714cvs.amzn2023.0 13 MB/s | 108 kB 00:00 2025-05-07T19:42:49.9576380Z (20/107): libfdisk-2.37.4-1.amzn2023.0.4.x86_64 19 MB/s | 153 kB 00:00 2025-05-07T19:42:49.9597309Z (21/107): libfido2-1.10.0-2.amzn2023.0.2.x86_64 9.5 MB/s | 95 kB 00:00 2025-05-07T19:42:49.9619975Z (22/107): libmetalink-0.1.3-14.amzn2023.0.2.x86 3.1 MB/s | 31 kB 00:00 2025-05-07T19:42:49.9671957Z (23/107): libpwquality-1.4.4-6.amzn2023.0.2.x86 16 MB/s | 106 kB 00:00 2025-05-07T19:42:49.9691583Z (24/107): libsemanage-3.4-5.amzn2023.0.2.x86_64 14 MB/s | 121 kB 00:00 2025-05-07T19:42:49.9712087Z (25/107): libutempter-1.2.1-4.amzn2023.0.2.x86_ 2.9 MB/s | 26 kB 00:00 2025-05-07T19:42:49.9757255Z (26/107): nano-default-editor-8.3-1.amzn2023.no 1.7 MB/s | 10 kB 00:00 2025-05-07T19:42:49.9812902Z (27/107): ncurses-6.2-4.20200222.amzn2023.0.6.x 40 MB/s | 394 kB 00:00 2025-05-07T19:42:49.9872705Z (28/107): nano-8.3-1.amzn2023.x86_64.rpm 39 MB/s | 706 kB 00:00 2025-05-07T19:42:49.9922329Z (29/107): nettle-3.10.1-1.amzn2023.0.1.x86_64.r 38 MB/s | 573 kB 00:00 2025-05-07T19:42:49.9957302Z (30/107): openldap-2.4.57-6.amzn2023.0.7.x86_64 20 MB/s | 256 kB 00:00 2025-05-07T19:42:50.0002581Z (31/107): openssh-8.7p1-8.amzn2023.0.14.x86_64. 36 MB/s | 454 kB 00:00 2025-05-07T19:42:50.0057423Z (32/107): openssh-clients-8.7p1-8.amzn2023.0.14 52 MB/s | 708 kB 00:00 2025-05-07T19:42:50.0109114Z (33/107): pam-1.5.1-8.amzn2023.0.4.x86_64.rpm 37 MB/s | 542 kB 00:00 2025-05-07T19:42:50.0137358Z (34/107): pciutils-3.7.0-3.amzn2023.0.2.x86_64. 7.6 MB/s | 93 kB 00:00 2025-05-07T19:42:50.0153041Z (35/107): pciutils-libs-3.7.0-3.amzn2023.0.2.x8 4.8 MB/s | 41 kB 00:00 2025-05-07T19:42:50.0170157Z (36/107): perl-AutoLoader-5.74-477.amzn2023.0.6 3.9 MB/s | 22 kB 00:00 2025-05-07T19:42:50.0228356Z (37/107): perl-B-1.80-477.amzn2023.0.6.x86_64.r 25 MB/s | 179 kB 00:00 2025-05-07T19:42:50.0243450Z (38/107): perl-Carp-1.50-458.amzn2023.0.2.noarc 3.4 MB/s | 29 kB 00:00 2025-05-07T19:42:50.0261589Z (39/107): perl-Class-Struct-0.66-477.amzn2023.0 2.4 MB/s | 22 kB 00:00 2025-05-07T19:42:50.0308116Z (40/107): perl-Data-Dumper-2.174-460.amzn2023.0 9.5 MB/s | 55 kB 00:00 2025-05-07T19:42:50.0325119Z (41/107): perl-Digest-1.20-1.amzn2023.0.2.noarc 3.5 MB/s | 26 kB 00:00 2025-05-07T19:42:50.0350488Z (42/107): perl-Digest-MD5-2.58-2.amzn2023.0.2.x 4.2 MB/s | 36 kB 00:00 2025-05-07T19:42:50.0411807Z (43/107): perl-DynaLoader-1.47-477.amzn2023.0.6 3.3 MB/s | 26 kB 00:00 2025-05-07T19:42:50.0506449Z (44/107): perl-Encode-3.15-462.amzn2023.0.2.x86 98 MB/s | 1.7 MB 00:00 2025-05-07T19:42:50.0519389Z (45/107): perl-Errno-1.30-477.amzn2023.0.6.x86_ 927 kB/s | 15 kB 00:00 2025-05-07T19:42:50.0537742Z (46/107): perl-Error-0.17029-5.amzn2023.0.2.noa 3.5 MB/s | 41 kB 00:00 2025-05-07T19:42:50.0581161Z (47/107): perl-Exporter-5.74-459.amzn2023.0.2.n 5.3 MB/s | 31 kB 00:00 2025-05-07T19:42:50.0599548Z (48/107): perl-Fcntl-1.13-477.amzn2023.0.6.x86_ 2.7 MB/s | 21 kB 00:00 2025-05-07T19:42:50.0616521Z (49/107): perl-File-Basename-2.85-477.amzn2023. 2.4 MB/s | 18 kB 00:00 2025-05-07T19:42:50.0639784Z (50/107): perl-File-Find-1.37-477.amzn2023.0.6. 4.8 MB/s | 26 kB 00:00 2025-05-07T19:42:50.0678878Z (51/107): perl-File-Path-2.18-2.amzn2023.0.2.no 6.4 MB/s | 36 kB 00:00 2025-05-07T19:42:50.0700882Z (52/107): perl-File-Temp-0.231.100-2.amzn2023.0 7.7 MB/s | 60 kB 00:00 2025-05-07T19:42:50.0716556Z (53/107): perl-File-stat-1.09-477.amzn2023.0.6. 2.3 MB/s | 17 kB 00:00 2025-05-07T19:42:50.0740455Z (54/107): perl-FileHandle-2.03-477.amzn2023.0.6 2.8 MB/s | 16 kB 00:00 2025-05-07T19:42:50.0776892Z (55/107): perl-Getopt-Std-1.12-477.amzn2023.0.6 3.1 MB/s | 16 kB 00:00 2025-05-07T19:42:50.0803065Z (56/107): perl-Getopt-Long-2.52-2.amzn2023.0.2. 7.7 MB/s | 60 kB 00:00 2025-05-07T19:42:50.0818534Z (57/107): perl-Git-2.47.1-1.amzn2023.0.2.noarch 5.6 MB/s | 42 kB 00:00 2025-05-07T19:42:50.0842245Z (58/107): perl-HTTP-Tiny-0.078-1.amzn2023.0.3.n 9.5 MB/s | 56 kB 00:00 2025-05-07T19:42:50.0875842Z (59/107): perl-IO-1.43-477.amzn2023.0.6.x86_64. 17 MB/s | 87 kB 00:00 2025-05-07T19:42:50.0900330Z (60/107): perl-IO-Socket-IP-0.41-3.amzn2023.0.2 5.4 MB/s | 42 kB 00:00 2025-05-07T19:42:50.0930872Z (61/107): perl-IO-Socket-SSL-2.075-1.amzn2023.0 25 MB/s | 218 kB 00:00 2025-05-07T19:42:50.0945271Z (62/107): perl-IPC-Open3-1.21-477.amzn2023.0.6. 3.9 MB/s | 23 kB 00:00 2025-05-07T19:42:50.0981123Z (63/107): perl-MIME-Base64-3.16-2.amzn2023.0.2. 6.5 MB/s | 31 kB 00:00 2025-05-07T19:42:50.1000402Z (64/107): perl-NDBM_File-1.15-477.amzn2023.0.6. 4.3 MB/s | 23 kB 00:00 2025-05-07T19:42:50.1017782Z (65/107): perl-Mozilla-CA-20200520-4.amzn2023.0 1.5 MB/s | 13 kB 00:00 2025-05-07T19:42:50.1064697Z (66/107): perl-Net-SSLeay-1.94-1.amzn2023.0.1.x 48 MB/s | 392 kB 00:00 2025-05-07T19:42:50.1094355Z (67/107): perl-POSIX-1.94-477.amzn2023.0.6.x86_ 11 MB/s | 97 kB 00:00 2025-05-07T19:42:50.1119370Z (68/107): perl-PathTools-3.78-459.amzn2023.0.2. 8.5 MB/s | 85 kB 00:00 2025-05-07T19:42:50.1138495Z (69/107): perl-Pod-Escapes-1.07-458.amzn2023.0. 3.1 MB/s | 20 kB 00:00 2025-05-07T19:42:50.1188368Z (70/107): perl-Pod-Perldoc-3.28.01-459.amzn2023 13 MB/s | 84 kB 00:00 2025-05-07T19:42:50.1221767Z (71/107): perl-Pod-Simple-3.42-2.amzn2023.0.2.n 22 MB/s | 215 kB 00:00 2025-05-07T19:42:50.1238416Z (72/107): perl-Pod-Usage-2.01-2.amzn2023.0.2.no 4.3 MB/s | 41 kB 00:00 2025-05-07T19:42:50.1265237Z (73/107): perl-Scalar-List-Utils-1.56-459.amzn2 10 MB/s | 71 kB 00:00 2025-05-07T19:42:50.1295676Z (74/107): perl-SelectSaver-1.02-477.amzn2023.0. 2.5 MB/s | 12 kB 00:00 2025-05-07T19:42:50.1315112Z (75/107): perl-Socket-2.032-1.amzn2023.0.2.x86_ 8.0 MB/s | 55 kB 00:00 2025-05-07T19:42:50.1345384Z (76/107): perl-Storable-3.21-458.amzn2023.0.2.x 13 MB/s | 96 kB 00:00 2025-05-07T19:42:50.1371141Z (77/107): perl-Symbol-1.08-477.amzn2023.0.6.noa 3.1 MB/s | 15 kB 00:00 2025-05-07T19:42:50.1390338Z (78/107): perl-Term-ANSIColor-5.01-459.amzn2023 7.4 MB/s | 48 kB 00:00 2025-05-07T19:42:50.1409494Z (79/107): perl-Term-Cap-1.17-458.amzn2023.0.2.n 3.7 MB/s | 22 kB 00:00 2025-05-07T19:42:50.1440561Z (80/107): perl-TermReadKey-2.38-9.amzn2023.0.2. 5.4 MB/s | 36 kB 00:00 2025-05-07T19:42:50.1451992Z (81/107): perl-Text-ParseWords-3.30-458.amzn202 2.8 MB/s | 17 kB 00:00 2025-05-07T19:42:50.1478918Z (82/107): perl-Text-Tabs+Wrap-2021.0726-1.amzn2 3.4 MB/s | 22 kB 00:00 2025-05-07T19:42:50.1522602Z (83/107): perl-URI-5.09-1.amzn2023.0.2.noarch.r 18 MB/s | 108 kB 00:00 2025-05-07T19:42:50.1540291Z (84/107): perl-Time-Local-1.300-5.amzn2023.0.2. 4.2 MB/s | 34 kB 00:00 2025-05-07T19:42:50.1553019Z (85/107): perl-base-2.27-477.amzn2023.0.6.noarc 2.3 MB/s | 17 kB 00:00 2025-05-07T19:42:50.1572567Z (86/107): perl-constant-1.33-459.amzn2023.0.2.n 5.0 MB/s | 23 kB 00:00 2025-05-07T19:42:50.1602075Z (87/107): perl-if-0.60.800-477.amzn2023.0.6.noa 3.3 MB/s | 14 kB 00:00 2025-05-07T19:42:50.1626073Z (88/107): perl-interpreter-5.32.1-477.amzn2023. 10 MB/s | 71 kB 00:00 2025-05-07T19:42:50.1644252Z (89/107): perl-lib-0.65-477.amzn2023.0.6.x86_64 2.3 MB/s | 15 kB 00:00 2025-05-07T19:42:50.1680582Z (90/107): perl-libnet-3.13-2.amzn2023.0.2.noarc 17 MB/s | 126 kB 00:00 2025-05-07T19:42:50.1759653Z (91/107): perl-mro-1.23-477.amzn2023.0.6.x86_64 2.5 MB/s | 29 kB 00:00 2025-05-07T19:42:50.1862132Z (92/107): perl-libs-5.32.1-477.amzn2023.0.6.x86 88 MB/s | 2.0 MB 00:00 2025-05-07T19:42:50.1877579Z (93/107): perl-overload-1.31-477.amzn2023.0.6.n 2.4 MB/s | 46 kB 00:00 2025-05-07T19:42:50.1899046Z (94/107): perl-overloading-0.02-477.amzn2023.0. 982 kB/s | 13 kB 00:00 2025-05-07T19:42:50.1939965Z (95/107): perl-parent-0.238-458.amzn2023.0.2.no 2.5 MB/s | 14 kB 00:00 2025-05-07T19:42:50.1964007Z (96/107): perl-podlators-4.14-458.amzn2023.0.2. 14 MB/s | 112 kB 00:00 2025-05-07T19:42:50.1985175Z (97/107): perl-subs-1.03-477.amzn2023.0.6.noarc 1.4 MB/s | 12 kB 00:00 2025-05-07T19:42:50.2031259Z (98/107): perl-vars-1.05-477.amzn2023.0.6.noarc 2.3 MB/s | 13 kB 00:00 2025-05-07T19:42:50.2113406Z (99/107): shadow-utils-4.9-12.amzn2023.0.4.x86_ 79 MB/s | 1.1 MB 00:00 2025-05-07T19:42:50.2191653Z (100/107): sudo-1.9.15-1.p5.amzn2023.0.1.x86_64 63 MB/s | 1.3 MB 00:00 2025-05-07T19:42:50.2207362Z (101/107): sudo-python-plugin-1.9.15-1.p5.amzn2 3.6 MB/s | 56 kB 00:00 2025-05-07T19:42:50.2258443Z (102/107): systemd-libs-252.23-3.amzn2023.x86_6 45 MB/s | 613 kB 00:00 2025-05-07T19:42:50.2330390Z (103/107): tar-1.34-1.amzn2023.0.4.x86_64.rpm 78 MB/s | 879 kB 00:00 2025-05-07T19:42:50.2467401Z (104/107): util-linux-2.37.4-1.amzn2023.0.4.x86 88 MB/s | 2.2 MB 00:00 2025-05-07T19:42:50.2520046Z (105/107): util-linux-core-2.37.4-1.amzn2023.0. 17 MB/s | 432 kB 00:00 2025-05-07T19:42:50.2565776Z (106/107): wget-1.21.3-1.amzn2023.0.4.x86_64.rp 37 MB/s | 779 kB 00:00 2025-05-07T19:42:50.2578645Z (107/107): which-2.21-26.amzn2023.0.2.x86_64.rp 4.3 MB/s | 42 kB 00:00 2025-05-07T19:42:50.2600349Z -------------------------------------------------------------------------------- 2025-05-07T19:42:50.2601396Z Total 50 MB/s | 38 MB 00:00 2025-05-07T19:42:51.3245310Z Running transaction check 2025-05-07T19:42:51.3716174Z Transaction check succeeded. 2025-05-07T19:42:51.3717023Z Running transaction test 2025-05-07T19:42:51.7386099Z Transaction test succeeded. 2025-05-07T19:42:51.7387999Z Running transaction 2025-05-07T19:42:52.4888920Z Preparing : 1/1 2025-05-07T19:42:52.5023078Z Installing : systemd-libs-252.23-3.amzn2023.x86_64 1/107 2025-05-07T19:42:52.5259392Z Installing : nettle-3.10.1-1.amzn2023.0.1.x86_64 2/107 2025-05-07T19:42:52.5456394Z Installing : gnutls-3.8.3-6.amzn2023.0.1.x86_64 3/107 2025-05-07T19:42:52.5497769Z Installing : util-linux-core-2.37.4-1.amzn2023.0.4.x86_64 4/107 2025-05-07T19:42:52.5572148Z Running scriptlet: util-linux-core-2.37.4-1.amzn2023.0.4.x86_64 4/107 2025-05-07T19:42:52.5661757Z Installing : pciutils-libs-3.7.0-3.amzn2023.0.2.x86_64 5/107 2025-05-07T19:42:52.5933374Z Installing : ncurses-6.2-4.20200222.amzn2023.0.6.x86_64 6/107 2025-05-07T19:42:52.5987780Z Installing : nano-8.3-1.amzn2023.x86_64 7/107 2025-05-07T19:42:52.6040004Z Installing : nano-default-editor-8.3-1.amzn2023.noarch 8/107 2025-05-07T19:42:52.6535756Z Installing : libsemanage-3.4-5.amzn2023.0.2.x86_64 9/107 2025-05-07T19:42:52.6597275Z Installing : shadow-utils-2:4.9-12.amzn2023.0.4.x86_64 10/107 2025-05-07T19:42:52.6920889Z Running scriptlet: libutempter-1.2.1-4.amzn2023.0.2.x86_64 11/107 2025-05-07T19:42:52.6967859Z Installing : libutempter-1.2.1-4.amzn2023.0.2.x86_64 11/107 2025-05-07T19:42:52.7019924Z Installing : libmetalink-0.1.3-14.amzn2023.0.2.x86_64 12/107 2025-05-07T19:42:52.7075917Z Installing : libfdisk-2.37.4-1.amzn2023.0.4.x86_64 13/107 2025-05-07T19:42:52.7126504Z Installing : libedit-3.1-38.20210714cvs.amzn2023.0.2.x86_64 14/107 2025-05-07T19:42:52.7264990Z Installing : libeconf-0.4.0-1.amzn2023.0.3.x86_64 15/107 2025-05-07T19:42:52.7316884Z Installing : libdb-5.3.28-49.amzn2023.0.2.x86_64 16/107 2025-05-07T19:42:52.7371042Z Installing : libcbor-0.7.0-3.amzn2023.0.2.x86_64 17/107 2025-05-07T19:42:52.7443372Z Installing : libfido2-1.10.0-2.amzn2023.0.2.x86_64 18/107 2025-05-07T19:42:52.7498666Z Installing : less-608-2.amzn2023.0.2.x86_64 19/107 2025-05-07T19:42:52.7547597Z Installing : kmod-libs-29-2.amzn2023.0.5.x86_64 20/107 2025-05-07T19:42:52.7977676Z Installing : jansson-2.14-0.amzn2023.x86_64 21/107 2025-05-07T19:42:52.8058370Z Installing : hwdata-0.384-1.amzn2023.0.3.noarch 22/107 2025-05-07T19:42:52.8209413Z Installing : gzip-1.12-1.amzn2023.0.1.x86_64 23/107 2025-05-07T19:42:52.8639787Z Installing : cracklib-2.9.6-27.amzn2023.0.2.x86_64 24/107 2025-05-07T19:42:52.8823764Z Installing : pam-1.5.1-8.amzn2023.0.4.x86_64 25/107 2025-05-07T19:42:52.9635656Z Installing : libpwquality-1.4.4-6.amzn2023.0.2.x86_64 26/107 2025-05-07T19:42:52.9637324Z Installing : util-linux-2.37.4-1.amzn2023.0.4.x86_64 27/107 2025-05-07T19:42:52.9638706Z warning: /etc/adjtime created as /etc/adjtime.rpmnew 2025-05-07T19:42:52.9639450Z 2025-05-07T19:42:52.9855858Z Running scriptlet: util-linux-2.37.4-1.amzn2023.0.4.x86_64 27/107 2025-05-07T19:42:53.0198049Z Running scriptlet: openssh-8.7p1-8.amzn2023.0.14.x86_64 28/107 2025-05-07T19:42:53.0399993Z Installing : openssh-8.7p1-8.amzn2023.0.14.x86_64 28/107 2025-05-07T19:42:53.0470897Z Installing : openssh-clients-8.7p1-8.amzn2023.0.14.x86_64 29/107 2025-05-07T19:42:53.1581959Z Running scriptlet: openssh-clients-8.7p1-8.amzn2023.0.14.x86_64 29/107 2025-05-07T19:42:53.3085709Z Installing : git-core-2.47.1-1.amzn2023.0.2.x86_64 30/107 2025-05-07T19:42:53.3231438Z Installing : git-core-doc-2.47.1-1.amzn2023.0.2.noarch 31/107 2025-05-07T19:42:53.3649765Z Running scriptlet: groff-base-1.22.4-7.amzn2023.0.2.x86_64 32/107 2025-05-07T19:42:53.3741424Z Installing : groff-base-1.22.4-7.amzn2023.0.2.x86_64 32/107 2025-05-07T19:42:53.3818181Z Running scriptlet: groff-base-1.22.4-7.amzn2023.0.2.x86_64 32/107 2025-05-07T19:42:53.3898137Z Installing : perl-Digest-1.20-1.amzn2023.0.2.noarch 33/107 2025-05-07T19:42:53.3985430Z Installing : perl-Digest-MD5-2.58-2.amzn2023.0.2.x86_64 34/107 2025-05-07T19:42:53.4047056Z Installing : perl-B-1.80-477.amzn2023.0.6.x86_64 35/107 2025-05-07T19:42:53.4096666Z Installing : perl-FileHandle-2.03-477.amzn2023.0.6.noarch 36/107 2025-05-07T19:42:53.4150200Z Installing : perl-AutoLoader-5.74-477.amzn2023.0.6.noarch 37/107 2025-05-07T19:42:53.4236183Z Installing : perl-Data-Dumper-2.174-460.amzn2023.0.2.x86_64 38/107 2025-05-07T19:42:53.4305599Z Installing : perl-libnet-3.13-2.amzn2023.0.2.noarch 39/107 2025-05-07T19:42:53.4404683Z Installing : perl-base-2.27-477.amzn2023.0.6.noarch 40/107 2025-05-07T19:42:53.4617332Z Installing : perl-URI-5.09-1.amzn2023.0.2.noarch 41/107 2025-05-07T19:42:53.4704622Z Installing : perl-Net-SSLeay-1.94-1.amzn2023.0.1.x86_64 42/107 2025-05-07T19:42:53.4759384Z Installing : perl-Text-Tabs+Wrap-2021.0726-1.amzn2023.0.1.noa 43/107 2025-05-07T19:42:53.4807858Z Installing : perl-Mozilla-CA-20200520-4.amzn2023.0.2.noarch 44/107 2025-05-07T19:42:53.4867937Z Installing : perl-if-0.60.800-477.amzn2023.0.6.noarch 45/107 2025-05-07T19:42:53.4935484Z Installing : perl-IO-Socket-IP-0.41-3.amzn2023.0.2.noarch 46/107 2025-05-07T19:42:53.4994391Z Installing : perl-Time-Local-2:1.300-5.amzn2023.0.2.noarch 47/107 2025-05-07T19:42:53.5081166Z Installing : perl-File-Path-2.18-2.amzn2023.0.2.noarch 48/107 2025-05-07T19:42:53.5150161Z Installing : perl-IO-Socket-SSL-2.075-1.amzn2023.0.2.noarch 49/107 2025-05-07T19:42:53.5200494Z Installing : perl-Pod-Escapes-1:1.07-458.amzn2023.0.2.noarch 50/107 2025-05-07T19:42:53.5258235Z Installing : perl-Class-Struct-0.66-477.amzn2023.0.6.noarch 51/107 2025-05-07T19:42:53.5320902Z Installing : perl-POSIX-1.94-477.amzn2023.0.6.x86_64 52/107 2025-05-07T19:42:53.5373884Z Installing : perl-Term-ANSIColor-5.01-459.amzn2023.0.2.noarch 53/107 2025-05-07T19:42:53.5418159Z Installing : perl-IPC-Open3-1.21-477.amzn2023.0.6.noarch 54/107 2025-05-07T19:42:53.5477635Z Installing : perl-subs-1.03-477.amzn2023.0.6.noarch 55/107 2025-05-07T19:42:53.5548698Z Installing : perl-File-Temp-1:0.231.100-2.amzn2023.0.2.noarch 56/107 2025-05-07T19:42:53.5606141Z Installing : perl-HTTP-Tiny-0.078-1.amzn2023.0.3.noarch 57/107 2025-05-07T19:42:53.5719834Z Installing : perl-Term-Cap-1.17-458.amzn2023.0.2.noarch 58/107 2025-05-07T19:42:53.5803825Z Installing : perl-Pod-Simple-1:3.42-2.amzn2023.0.2.noarch 59/107 2025-05-07T19:42:53.5858263Z Installing : perl-Socket-4:2.032-1.amzn2023.0.2.x86_64 60/107 2025-05-07T19:42:53.5907005Z Installing : perl-SelectSaver-1.02-477.amzn2023.0.6.noarch 61/107 2025-05-07T19:42:53.5952817Z Installing : perl-Symbol-1.08-477.amzn2023.0.6.noarch 62/107 2025-05-07T19:42:53.6026231Z Installing : perl-File-stat-1.09-477.amzn2023.0.6.noarch 63/107 2025-05-07T19:42:53.6127701Z Installing : perl-podlators-1:4.14-458.amzn2023.0.2.noarch 64/107 2025-05-07T19:42:53.6202359Z Installing : perl-Pod-Perldoc-3.28.01-459.amzn2023.0.3.noarch 65/107 2025-05-07T19:42:53.6258597Z Installing : perl-Fcntl-1.13-477.amzn2023.0.6.x86_64 66/107 2025-05-07T19:42:53.6314155Z Installing : perl-Text-ParseWords-3.30-458.amzn2023.0.2.noarc 67/107 2025-05-07T19:42:53.6383991Z Installing : perl-mro-1.23-477.amzn2023.0.6.x86_64 68/107 2025-05-07T19:42:53.6449850Z Installing : perl-IO-1.43-477.amzn2023.0.6.x86_64 69/107 2025-05-07T19:42:53.6506760Z Installing : perl-overloading-0.02-477.amzn2023.0.6.noarch 70/107 2025-05-07T19:42:53.6577750Z Installing : perl-Pod-Usage-4:2.01-2.amzn2023.0.2.noarch 71/107 2025-05-07T19:42:53.6638683Z Installing : perl-Errno-1.30-477.amzn2023.0.6.x86_64 72/107 2025-05-07T19:42:53.6686717Z Installing : perl-File-Basename-2.85-477.amzn2023.0.6.noarch 73/107 2025-05-07T19:42:53.6746495Z Installing : perl-Getopt-Std-1.12-477.amzn2023.0.6.noarch 74/107 2025-05-07T19:42:53.6825993Z Installing : perl-MIME-Base64-3.16-2.amzn2023.0.2.x86_64 75/107 2025-05-07T19:42:53.6902375Z Installing : perl-Scalar-List-Utils-4:1.56-459.amzn2023.0.2.x 76/107 2025-05-07T19:42:53.6973521Z Installing : perl-constant-1.33-459.amzn2023.0.2.noarch 77/107 2025-05-07T19:42:53.7041359Z Installing : perl-Storable-1:3.21-458.amzn2023.0.2.x86_64 78/107 2025-05-07T19:42:53.7089021Z Installing : perl-overload-1.31-477.amzn2023.0.6.noarch 79/107 2025-05-07T19:42:53.7145973Z Installing : perl-parent-1:0.238-458.amzn2023.0.2.noarch 80/107 2025-05-07T19:42:53.7209970Z Installing : perl-vars-1.05-477.amzn2023.0.6.noarch 81/107 2025-05-07T19:42:53.7266919Z Installing : perl-Getopt-Long-1:2.52-2.amzn2023.0.2.noarch 82/107 2025-05-07T19:42:53.7323658Z Installing : perl-DynaLoader-1.47-477.amzn2023.0.6.x86_64 83/107 2025-05-07T19:42:53.7377466Z Installing : perl-Carp-1.50-458.amzn2023.0.2.noarch 84/107 2025-05-07T19:42:53.7434261Z Installing : perl-Exporter-5.74-459.amzn2023.0.2.noarch 85/107 2025-05-07T19:42:53.7513241Z Installing : perl-NDBM_File-1.15-477.amzn2023.0.6.x86_64 86/107 2025-05-07T19:42:53.8051749Z Installing : perl-PathTools-3.78-459.amzn2023.0.2.x86_64 87/107 2025-05-07T19:42:53.9017551Z Installing : perl-Encode-4:3.15-462.amzn2023.0.2.x86_64 88/107 2025-05-07T19:42:53.9147991Z Installing : perl-libs-4:5.32.1-477.amzn2023.0.6.x86_64 89/107 2025-05-07T19:42:53.9225226Z Installing : perl-interpreter-4:5.32.1-477.amzn2023.0.6.x86_6 90/107 2025-05-07T19:42:53.9297605Z Installing : perl-Error-1:0.17029-5.amzn2023.0.2.noarch 91/107 2025-05-07T19:42:53.9361155Z Installing : perl-File-Find-1.37-477.amzn2023.0.6.noarch 92/107 2025-05-07T19:42:53.9434041Z Installing : perl-TermReadKey-2.38-9.amzn2023.0.2.x86_64 93/107 2025-05-07T19:42:53.9479867Z Installing : perl-lib-0.65-477.amzn2023.0.6.x86_64 94/107 2025-05-07T19:42:53.9548711Z Installing : perl-Git-2.47.1-1.amzn2023.0.2.noarch 95/107 2025-05-07T19:42:53.9625674Z Installing : git-2.47.1-1.amzn2023.0.2.x86_64 96/107 2025-05-07T19:42:53.9831241Z Installing : elfutils-debuginfod-client-0.188-3.amzn2023.0.2. 97/107 2025-05-07T19:42:53.9956298Z Installing : cyrus-sasl-lib-2.1.27-18.amzn2023.0.3.x86_64 98/107 2025-05-07T19:42:54.0047928Z Installing : openldap-2.4.57-6.amzn2023.0.7.x86_64 99/107 2025-05-07T19:42:54.0450905Z Installing : sudo-python-plugin-1.9.15-1.p5.amzn2023.0.1.x86_ 100/107 2025-05-07T19:42:54.1679817Z Installing : sudo-1.9.15-1.p5.amzn2023.0.1.x86_64 101/107 2025-05-07T19:42:54.1773742Z Installing : binutils-2.41-50.amzn2023.0.3.x86_64 102/107 2025-05-07T19:42:54.1898253Z Running scriptlet: binutils-2.41-50.amzn2023.0.3.x86_64 102/107 2025-05-07T19:42:54.2197971Z Installing : pciutils-3.7.0-3.amzn2023.0.2.x86_64 103/107 2025-05-07T19:42:54.2298895Z Installing : wget-1.21.3-1.amzn2023.0.4.x86_64 104/107 2025-05-07T19:42:54.2545253Z Installing : which-2.21-26.amzn2023.0.2.x86_64 105/107 2025-05-07T19:42:54.2763163Z Installing : tar-2:1.34-1.amzn2023.0.4.x86_64 106/107 2025-05-07T19:42:54.2849787Z Installing : findutils-1:4.8.0-2.amzn2023.0.2.x86_64 107/107 2025-05-07T19:42:54.2968241Z Running scriptlet: pam-1.5.1-8.amzn2023.0.4.x86_64 107/107 2025-05-07T19:42:55.0613371Z Running scriptlet: findutils-1:4.8.0-2.amzn2023.0.2.x86_64 107/107 2025-05-07T19:42:55.0616062Z Verifying : binutils-2.41-50.amzn2023.0.3.x86_64 1/107 2025-05-07T19:42:55.0617803Z Verifying : cracklib-2.9.6-27.amzn2023.0.2.x86_64 2/107 2025-05-07T19:42:55.0619571Z Verifying : cyrus-sasl-lib-2.1.27-18.amzn2023.0.3.x86_64 3/107 2025-05-07T19:42:55.0620939Z Verifying : elfutils-debuginfod-client-0.188-3.amzn2023.0.2. 4/107 2025-05-07T19:42:55.0621570Z Verifying : findutils-1:4.8.0-2.amzn2023.0.2.x86_64 5/107 2025-05-07T19:42:55.0622177Z Verifying : git-2.47.1-1.amzn2023.0.2.x86_64 6/107 2025-05-07T19:42:55.0622811Z Verifying : git-core-2.47.1-1.amzn2023.0.2.x86_64 7/107 2025-05-07T19:42:55.0623501Z Verifying : git-core-doc-2.47.1-1.amzn2023.0.2.noarch 8/107 2025-05-07T19:42:55.0624395Z Verifying : gnutls-3.8.3-6.amzn2023.0.1.x86_64 9/107 2025-05-07T19:42:55.0625102Z Verifying : groff-base-1.22.4-7.amzn2023.0.2.x86_64 10/107 2025-05-07T19:42:55.0625750Z Verifying : gzip-1.12-1.amzn2023.0.1.x86_64 11/107 2025-05-07T19:42:55.0626338Z Verifying : hwdata-0.384-1.amzn2023.0.3.noarch 12/107 2025-05-07T19:42:55.0627110Z Verifying : jansson-2.14-0.amzn2023.x86_64 13/107 2025-05-07T19:42:55.0627699Z Verifying : kmod-libs-29-2.amzn2023.0.5.x86_64 14/107 2025-05-07T19:42:55.0628311Z Verifying : less-608-2.amzn2023.0.2.x86_64 15/107 2025-05-07T19:42:55.0629009Z Verifying : libcbor-0.7.0-3.amzn2023.0.2.x86_64 16/107 2025-05-07T19:42:55.0629590Z Verifying : libdb-5.3.28-49.amzn2023.0.2.x86_64 17/107 2025-05-07T19:42:55.0630210Z Verifying : libeconf-0.4.0-1.amzn2023.0.3.x86_64 18/107 2025-05-07T19:42:55.0630864Z Verifying : libedit-3.1-38.20210714cvs.amzn2023.0.2.x86_64 19/107 2025-05-07T19:42:55.0631506Z Verifying : libfdisk-2.37.4-1.amzn2023.0.4.x86_64 20/107 2025-05-07T19:42:55.0632130Z Verifying : libfido2-1.10.0-2.amzn2023.0.2.x86_64 21/107 2025-05-07T19:42:55.0632837Z Verifying : libmetalink-0.1.3-14.amzn2023.0.2.x86_64 22/107 2025-05-07T19:42:55.0633497Z Verifying : libpwquality-1.4.4-6.amzn2023.0.2.x86_64 23/107 2025-05-07T19:42:55.0634103Z Verifying : libsemanage-3.4-5.amzn2023.0.2.x86_64 24/107 2025-05-07T19:42:55.0634805Z Verifying : libutempter-1.2.1-4.amzn2023.0.2.x86_64 25/107 2025-05-07T19:42:55.0635454Z Verifying : nano-8.3-1.amzn2023.x86_64 26/107 2025-05-07T19:42:55.0636057Z Verifying : nano-default-editor-8.3-1.amzn2023.noarch 27/107 2025-05-07T19:42:55.0636767Z Verifying : ncurses-6.2-4.20200222.amzn2023.0.6.x86_64 28/107 2025-05-07T19:42:55.0637349Z Verifying : nettle-3.10.1-1.amzn2023.0.1.x86_64 29/107 2025-05-07T19:42:55.0638002Z Verifying : openldap-2.4.57-6.amzn2023.0.7.x86_64 30/107 2025-05-07T19:42:55.0638702Z Verifying : openssh-8.7p1-8.amzn2023.0.14.x86_64 31/107 2025-05-07T19:42:55.0639302Z Verifying : openssh-clients-8.7p1-8.amzn2023.0.14.x86_64 32/107 2025-05-07T19:42:55.0639991Z Verifying : pam-1.5.1-8.amzn2023.0.4.x86_64 33/107 2025-05-07T19:42:55.0640632Z Verifying : pciutils-3.7.0-3.amzn2023.0.2.x86_64 34/107 2025-05-07T19:42:55.0641286Z Verifying : pciutils-libs-3.7.0-3.amzn2023.0.2.x86_64 35/107 2025-05-07T19:42:55.0641946Z Verifying : perl-AutoLoader-5.74-477.amzn2023.0.6.noarch 36/107 2025-05-07T19:42:55.0642764Z Verifying : perl-B-1.80-477.amzn2023.0.6.x86_64 37/107 2025-05-07T19:42:55.0643422Z Verifying : perl-Carp-1.50-458.amzn2023.0.2.noarch 38/107 2025-05-07T19:42:55.0644036Z Verifying : perl-Class-Struct-0.66-477.amzn2023.0.6.noarch 39/107 2025-05-07T19:42:55.0644736Z Verifying : perl-Data-Dumper-2.174-460.amzn2023.0.2.x86_64 40/107 2025-05-07T19:42:55.0645403Z Verifying : perl-Digest-1.20-1.amzn2023.0.2.noarch 41/107 2025-05-07T19:42:55.0646018Z Verifying : perl-Digest-MD5-2.58-2.amzn2023.0.2.x86_64 42/107 2025-05-07T19:42:55.0646711Z Verifying : perl-DynaLoader-1.47-477.amzn2023.0.6.x86_64 43/107 2025-05-07T19:42:55.0647319Z Verifying : perl-Encode-4:3.15-462.amzn2023.0.2.x86_64 44/107 2025-05-07T19:42:55.0647960Z Verifying : perl-Errno-1.30-477.amzn2023.0.6.x86_64 45/107 2025-05-07T19:42:55.0648640Z Verifying : perl-Error-1:0.17029-5.amzn2023.0.2.noarch 46/107 2025-05-07T19:42:55.0649327Z Verifying : perl-Exporter-5.74-459.amzn2023.0.2.noarch 47/107 2025-05-07T19:42:55.0649988Z Verifying : perl-Fcntl-1.13-477.amzn2023.0.6.x86_64 48/107 2025-05-07T19:42:55.0650604Z Verifying : perl-File-Basename-2.85-477.amzn2023.0.6.noarch 49/107 2025-05-07T19:42:55.0651303Z Verifying : perl-File-Find-1.37-477.amzn2023.0.6.noarch 50/107 2025-05-07T19:42:55.0651909Z Verifying : perl-File-Path-2.18-2.amzn2023.0.2.noarch 51/107 2025-05-07T19:42:55.0652514Z Verifying : perl-File-Temp-1:0.231.100-2.amzn2023.0.2.noarch 52/107 2025-05-07T19:42:55.0653039Z Verifying : perl-File-stat-1.09-477.amzn2023.0.6.noarch 53/107 2025-05-07T19:42:55.0653600Z Verifying : perl-FileHandle-2.03-477.amzn2023.0.6.noarch 54/107 2025-05-07T19:42:55.0654148Z Verifying : perl-Getopt-Long-1:2.52-2.amzn2023.0.2.noarch 55/107 2025-05-07T19:42:55.0654717Z Verifying : perl-Getopt-Std-1.12-477.amzn2023.0.6.noarch 56/107 2025-05-07T19:42:55.0655369Z Verifying : perl-Git-2.47.1-1.amzn2023.0.2.noarch 57/107 2025-05-07T19:42:55.0655922Z Verifying : perl-HTTP-Tiny-0.078-1.amzn2023.0.3.noarch 58/107 2025-05-07T19:42:55.0656466Z Verifying : perl-IO-1.43-477.amzn2023.0.6.x86_64 59/107 2025-05-07T19:42:55.0656989Z Verifying : perl-IO-Socket-IP-0.41-3.amzn2023.0.2.noarch 60/107 2025-05-07T19:42:55.0657558Z Verifying : perl-IO-Socket-SSL-2.075-1.amzn2023.0.2.noarch 61/107 2025-05-07T19:42:55.0658104Z Verifying : perl-IPC-Open3-1.21-477.amzn2023.0.6.noarch 62/107 2025-05-07T19:42:55.0658668Z Verifying : perl-MIME-Base64-3.16-2.amzn2023.0.2.x86_64 63/107 2025-05-07T19:42:55.0659272Z Verifying : perl-Mozilla-CA-20200520-4.amzn2023.0.2.noarch 64/107 2025-05-07T19:42:55.0659813Z Verifying : perl-NDBM_File-1.15-477.amzn2023.0.6.x86_64 65/107 2025-05-07T19:42:55.0660349Z Verifying : perl-Net-SSLeay-1.94-1.amzn2023.0.1.x86_64 66/107 2025-05-07T19:42:55.0660872Z Verifying : perl-POSIX-1.94-477.amzn2023.0.6.x86_64 67/107 2025-05-07T19:42:55.0661429Z Verifying : perl-PathTools-3.78-459.amzn2023.0.2.x86_64 68/107 2025-05-07T19:42:55.0661989Z Verifying : perl-Pod-Escapes-1:1.07-458.amzn2023.0.2.noarch 69/107 2025-05-07T19:42:55.0662533Z Verifying : perl-Pod-Perldoc-3.28.01-459.amzn2023.0.3.noarch 70/107 2025-05-07T19:42:55.0663099Z Verifying : perl-Pod-Simple-1:3.42-2.amzn2023.0.2.noarch 71/107 2025-05-07T19:42:55.0663627Z Verifying : perl-Pod-Usage-4:2.01-2.amzn2023.0.2.noarch 72/107 2025-05-07T19:42:55.0664184Z Verifying : perl-Scalar-List-Utils-4:1.56-459.amzn2023.0.2.x 73/107 2025-05-07T19:42:55.0664839Z Verifying : perl-SelectSaver-1.02-477.amzn2023.0.6.noarch 74/107 2025-05-07T19:42:55.0665395Z Verifying : perl-Socket-4:2.032-1.amzn2023.0.2.x86_64 75/107 2025-05-07T19:42:55.0665941Z Verifying : perl-Storable-1:3.21-458.amzn2023.0.2.x86_64 76/107 2025-05-07T19:42:55.0666465Z Verifying : perl-Symbol-1.08-477.amzn2023.0.6.noarch 77/107 2025-05-07T19:42:55.0667030Z Verifying : perl-Term-ANSIColor-5.01-459.amzn2023.0.2.noarch 78/107 2025-05-07T19:42:55.0667576Z Verifying : perl-Term-Cap-1.17-458.amzn2023.0.2.noarch 79/107 2025-05-07T19:42:55.0668132Z Verifying : perl-TermReadKey-2.38-9.amzn2023.0.2.x86_64 80/107 2025-05-07T19:42:55.0668699Z Verifying : perl-Text-ParseWords-3.30-458.amzn2023.0.2.noarc 81/107 2025-05-07T19:42:55.0669250Z Verifying : perl-Text-Tabs+Wrap-2021.0726-1.amzn2023.0.1.noa 82/107 2025-05-07T19:42:55.0669801Z Verifying : perl-Time-Local-2:1.300-5.amzn2023.0.2.noarch 83/107 2025-05-07T19:42:55.0670502Z Verifying : perl-URI-5.09-1.amzn2023.0.2.noarch 84/107 2025-05-07T19:42:55.0671021Z Verifying : perl-base-2.27-477.amzn2023.0.6.noarch 85/107 2025-05-07T19:42:55.0671539Z Verifying : perl-constant-1.33-459.amzn2023.0.2.noarch 86/107 2025-05-07T19:42:55.0672066Z Verifying : perl-if-0.60.800-477.amzn2023.0.6.noarch 87/107 2025-05-07T19:42:55.0672588Z Verifying : perl-interpreter-4:5.32.1-477.amzn2023.0.6.x86_6 88/107 2025-05-07T19:42:55.0673097Z Verifying : perl-lib-0.65-477.amzn2023.0.6.x86_64 89/107 2025-05-07T19:42:55.0673619Z Verifying : perl-libnet-3.13-2.amzn2023.0.2.noarch 90/107 2025-05-07T19:42:55.0674125Z Verifying : perl-libs-4:5.32.1-477.amzn2023.0.6.x86_64 91/107 2025-05-07T19:42:55.0674867Z Verifying : perl-mro-1.23-477.amzn2023.0.6.x86_64 92/107 2025-05-07T19:42:55.0675597Z Verifying : perl-overload-1.31-477.amzn2023.0.6.noarch 93/107 2025-05-07T19:42:55.0676148Z Verifying : perl-overloading-0.02-477.amzn2023.0.6.noarch 94/107 2025-05-07T19:42:55.0676705Z Verifying : perl-parent-1:0.238-458.amzn2023.0.2.noarch 95/107 2025-05-07T19:42:55.0677230Z Verifying : perl-podlators-1:4.14-458.amzn2023.0.2.noarch 96/107 2025-05-07T19:42:55.0677784Z Verifying : perl-subs-1.03-477.amzn2023.0.6.noarch 97/107 2025-05-07T19:42:55.0678305Z Verifying : perl-vars-1.05-477.amzn2023.0.6.noarch 98/107 2025-05-07T19:42:55.0678845Z Verifying : shadow-utils-2:4.9-12.amzn2023.0.4.x86_64 99/107 2025-05-07T19:42:55.0679373Z Verifying : sudo-1.9.15-1.p5.amzn2023.0.1.x86_64 100/107 2025-05-07T19:42:55.0679899Z Verifying : sudo-python-plugin-1.9.15-1.p5.amzn2023.0.1.x86_ 101/107 2025-05-07T19:42:55.0680469Z Verifying : systemd-libs-252.23-3.amzn2023.x86_64 102/107 2025-05-07T19:42:55.0680977Z Verifying : tar-2:1.34-1.amzn2023.0.4.x86_64 103/107 2025-05-07T19:42:55.0681493Z Verifying : util-linux-2.37.4-1.amzn2023.0.4.x86_64 104/107 2025-05-07T19:42:55.0682019Z Verifying : util-linux-core-2.37.4-1.amzn2023.0.4.x86_64 105/107 2025-05-07T19:42:55.0682552Z Verifying : wget-1.21.3-1.amzn2023.0.4.x86_64 106/107 2025-05-07T19:42:55.1838914Z Verifying : which-2.21-26.amzn2023.0.2.x86_64 107/107 2025-05-07T19:42:55.1839309Z 2025-05-07T19:42:55.1840099Z Installed: 2025-05-07T19:42:55.1840568Z binutils-2.41-50.amzn2023.0.3.x86_64 2025-05-07T19:42:55.1841126Z cracklib-2.9.6-27.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1841692Z cyrus-sasl-lib-2.1.27-18.amzn2023.0.3.x86_64 2025-05-07T19:42:55.1842599Z elfutils-debuginfod-client-0.188-3.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1843188Z findutils-1:4.8.0-2.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1843833Z git-2.47.1-1.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1844407Z git-core-2.47.1-1.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1844910Z git-core-doc-2.47.1-1.amzn2023.0.2.noarch 2025-05-07T19:42:55.1845388Z gnutls-3.8.3-6.amzn2023.0.1.x86_64 2025-05-07T19:42:55.1845878Z groff-base-1.22.4-7.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1846346Z gzip-1.12-1.amzn2023.0.1.x86_64 2025-05-07T19:42:55.1846833Z hwdata-0.384-1.amzn2023.0.3.noarch 2025-05-07T19:42:55.1847460Z jansson-2.14-0.amzn2023.x86_64 2025-05-07T19:42:55.1847940Z kmod-libs-29-2.amzn2023.0.5.x86_64 2025-05-07T19:42:55.1848427Z less-608-2.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1848940Z libcbor-0.7.0-3.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1849409Z libdb-5.3.28-49.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1849895Z libeconf-0.4.0-1.amzn2023.0.3.x86_64 2025-05-07T19:42:55.1850393Z libedit-3.1-38.20210714cvs.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1850909Z libfdisk-2.37.4-1.amzn2023.0.4.x86_64 2025-05-07T19:42:55.1851454Z libfido2-1.10.0-2.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1851965Z libmetalink-0.1.3-14.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1852480Z libpwquality-1.4.4-6.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1853001Z libsemanage-3.4-5.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1853497Z libutempter-1.2.1-4.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1853986Z nano-8.3-1.amzn2023.x86_64 2025-05-07T19:42:55.1854497Z nano-default-editor-8.3-1.amzn2023.noarch 2025-05-07T19:42:55.1855138Z ncurses-6.2-4.20200222.amzn2023.0.6.x86_64 2025-05-07T19:42:55.1855858Z nettle-3.10.1-1.amzn2023.0.1.x86_64 2025-05-07T19:42:55.1856449Z openldap-2.4.57-6.amzn2023.0.7.x86_64 2025-05-07T19:42:55.1856998Z openssh-8.7p1-8.amzn2023.0.14.x86_64 2025-05-07T19:42:55.1857563Z openssh-clients-8.7p1-8.amzn2023.0.14.x86_64 2025-05-07T19:42:55.1858113Z pam-1.5.1-8.amzn2023.0.4.x86_64 2025-05-07T19:42:55.1858627Z pciutils-3.7.0-3.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1859152Z pciutils-libs-3.7.0-3.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1859719Z perl-AutoLoader-5.74-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.1860263Z perl-B-1.80-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.1860807Z perl-Carp-1.50-458.amzn2023.0.2.noarch 2025-05-07T19:42:55.1861597Z perl-Class-Struct-0.66-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.1862213Z perl-Data-Dumper-2.174-460.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1862871Z perl-Digest-1.20-1.amzn2023.0.2.noarch 2025-05-07T19:42:55.1863386Z perl-Digest-MD5-2.58-2.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1863930Z perl-DynaLoader-1.47-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.1864446Z perl-Encode-4:3.15-462.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1864966Z perl-Errno-1.30-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.1865491Z perl-Error-1:0.17029-5.amzn2023.0.2.noarch 2025-05-07T19:42:55.1866009Z perl-Exporter-5.74-459.amzn2023.0.2.noarch 2025-05-07T19:42:55.1866545Z perl-Fcntl-1.13-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.1867071Z perl-File-Basename-2.85-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.1867636Z perl-File-Find-1.37-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.1868222Z perl-File-Path-2.18-2.amzn2023.0.2.noarch 2025-05-07T19:42:55.1868762Z perl-File-Temp-1:0.231.100-2.amzn2023.0.2.noarch 2025-05-07T19:42:55.1869300Z perl-File-stat-1.09-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.1869833Z perl-FileHandle-2.03-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.1870391Z perl-Getopt-Long-1:2.52-2.amzn2023.0.2.noarch 2025-05-07T19:42:55.1870915Z perl-Getopt-Std-1.12-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.1871444Z perl-Git-2.47.1-1.amzn2023.0.2.noarch 2025-05-07T19:42:55.1871974Z perl-HTTP-Tiny-0.078-1.amzn2023.0.3.noarch 2025-05-07T19:42:55.1872472Z perl-IO-1.43-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.1873168Z perl-IO-Socket-IP-0.41-3.amzn2023.0.2.noarch 2025-05-07T19:42:55.1873713Z perl-IO-Socket-SSL-2.075-1.amzn2023.0.2.noarch 2025-05-07T19:42:55.1874270Z perl-IPC-Open3-1.21-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.1875112Z perl-MIME-Base64-3.16-2.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1875888Z perl-Mozilla-CA-20200520-4.amzn2023.0.2.noarch 2025-05-07T19:42:55.1876500Z perl-NDBM_File-1.15-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.1877060Z perl-Net-SSLeay-1.94-1.amzn2023.0.1.x86_64 2025-05-07T19:42:55.1877647Z perl-POSIX-1.94-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.1878221Z perl-PathTools-3.78-459.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1878827Z perl-Pod-Escapes-1:1.07-458.amzn2023.0.2.noarch 2025-05-07T19:42:55.1879430Z perl-Pod-Perldoc-3.28.01-459.amzn2023.0.3.noarch 2025-05-07T19:42:55.1880003Z perl-Pod-Simple-1:3.42-2.amzn2023.0.2.noarch 2025-05-07T19:42:55.1880584Z perl-Pod-Usage-4:2.01-2.amzn2023.0.2.noarch 2025-05-07T19:42:55.1881154Z perl-Scalar-List-Utils-4:1.56-459.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1881770Z perl-SelectSaver-1.02-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.1882344Z perl-Socket-4:2.032-1.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1882916Z perl-Storable-1:3.21-458.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1883501Z perl-Symbol-1.08-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.1884233Z perl-Term-ANSIColor-5.01-459.amzn2023.0.2.noarch 2025-05-07T19:42:55.1884841Z perl-Term-Cap-1.17-458.amzn2023.0.2.noarch 2025-05-07T19:42:55.1885419Z perl-TermReadKey-2.38-9.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1886026Z perl-Text-ParseWords-3.30-458.amzn2023.0.2.noarch 2025-05-07T19:42:55.1886618Z perl-Text-Tabs+Wrap-2021.0726-1.amzn2023.0.1.noarch 2025-05-07T19:42:55.1887202Z perl-Time-Local-2:1.300-5.amzn2023.0.2.noarch 2025-05-07T19:42:55.1887976Z perl-URI-5.09-1.amzn2023.0.2.noarch 2025-05-07T19:42:55.1888468Z perl-base-2.27-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.1888996Z perl-constant-1.33-459.amzn2023.0.2.noarch 2025-05-07T19:42:55.1889503Z perl-if-0.60.800-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.1890193Z perl-interpreter-4:5.32.1-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.1890716Z perl-lib-0.65-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.1891221Z perl-libnet-3.13-2.amzn2023.0.2.noarch 2025-05-07T19:42:55.1891742Z perl-libs-4:5.32.1-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.1892226Z perl-mro-1.23-477.amzn2023.0.6.x86_64 2025-05-07T19:42:55.1892754Z perl-overload-1.31-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.1893293Z perl-overloading-0.02-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.1893842Z perl-parent-1:0.238-458.amzn2023.0.2.noarch 2025-05-07T19:42:55.1894376Z perl-podlators-1:4.14-458.amzn2023.0.2.noarch 2025-05-07T19:42:55.1894974Z perl-subs-1.03-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.1895700Z perl-vars-1.05-477.amzn2023.0.6.noarch 2025-05-07T19:42:55.1896241Z shadow-utils-2:4.9-12.amzn2023.0.4.x86_64 2025-05-07T19:42:55.1896778Z sudo-1.9.15-1.p5.amzn2023.0.1.x86_64 2025-05-07T19:42:55.1897334Z sudo-python-plugin-1.9.15-1.p5.amzn2023.0.1.x86_64 2025-05-07T19:42:55.1897895Z systemd-libs-252.23-3.amzn2023.x86_64 2025-05-07T19:42:55.1898418Z tar-2:1.34-1.amzn2023.0.4.x86_64 2025-05-07T19:42:55.1898919Z util-linux-2.37.4-1.amzn2023.0.4.x86_64 2025-05-07T19:42:55.1899472Z util-linux-core-2.37.4-1.amzn2023.0.4.x86_64 2025-05-07T19:42:55.1899992Z wget-1.21.3-1.amzn2023.0.4.x86_64 2025-05-07T19:42:55.1900535Z which-2.21-26.amzn2023.0.2.x86_64 2025-05-07T19:42:55.1900845Z 2025-05-07T19:42:55.1900957Z Complete! 2025-05-07T19:42:55.2672629Z ##[group]Run actions/checkout@v4 2025-05-07T19:42:55.2672971Z with: 2025-05-07T19:42:55.2673226Z submodules: true 2025-05-07T19:42:55.2673474Z repository: pytorch/FBGEMM 2025-05-07T19:42:55.2673998Z token: *** 2025-05-07T19:42:55.2674223Z ssh-strict: true 2025-05-07T19:42:55.2674487Z ssh-user: git 2025-05-07T19:42:55.2674926Z persist-credentials: true 2025-05-07T19:42:55.2675421Z clean: true 2025-05-07T19:42:55.2675827Z sparse-checkout-cone-mode: true 2025-05-07T19:42:55.2676161Z fetch-depth: 1 2025-05-07T19:42:55.2676437Z fetch-tags: false 2025-05-07T19:42:55.2676688Z show-progress: true 2025-05-07T19:42:55.2676970Z lfs: false 2025-05-07T19:42:55.2677208Z set-safe-directory: true 2025-05-07T19:42:55.2677722Z env: 2025-05-07T19:42:55.2677965Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:42:55.2678335Z BUILD_ENV: build_binary 2025-05-07T19:42:55.2678610Z BUILD_TARGET: genai 2025-05-07T19:42:55.2678899Z BUILD_VARIANT: cuda 2025-05-07T19:42:55.2679257Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:42:55.2679532Z ##[endgroup] 2025-05-07T19:42:55.2724863Z ##[command]/usr/bin/docker exec 619e625bd71e3b5994636768304a866b1a468a4f4ab5c062f71de4323d5ef686 sh -c "cat /etc/*release | grep ^ID" 2025-05-07T19:42:55.5902272Z Syncing repository: pytorch/FBGEMM 2025-05-07T19:42:55.5903707Z ##[group]Getting Git version info 2025-05-07T19:42:55.5904070Z Working directory is '/__w/FBGEMM/FBGEMM' 2025-05-07T19:42:55.5904630Z [command]/usr/bin/git version 2025-05-07T19:42:55.5904945Z git version 2.47.1 2025-05-07T19:42:55.5905935Z ##[endgroup] 2025-05-07T19:42:55.5913499Z Temporarily overriding HOME='/__w/_temp/a7a7b4f3-a2b1-47ca-8e18-6b0dd025b373' before making global git config changes 2025-05-07T19:42:55.5914345Z Adding repository directory to the temporary git global config as a safe directory 2025-05-07T19:42:55.5924993Z [command]/usr/bin/git config --global --add safe.directory /__w/FBGEMM/FBGEMM 2025-05-07T19:42:55.5955129Z [command]/usr/bin/git config --local --get remote.origin.url 2025-05-07T19:42:55.5972850Z https://github.com/pytorch/FBGEMM 2025-05-07T19:42:55.5987796Z ##[group]Removing previously created refs, to avoid conflicts 2025-05-07T19:42:55.5989700Z [command]/usr/bin/git rev-parse --symbolic-full-name --verify --quiet HEAD 2025-05-07T19:42:55.6010196Z HEAD 2025-05-07T19:42:55.6043345Z ##[endgroup] 2025-05-07T19:42:55.6044070Z [command]/usr/bin/git submodule status 2025-05-07T19:42:55.6378356Z e5d7c0bd5d9aec44d68830187138149e6a8c4e32 external/asmjit (e5d7c0b) 2025-05-07T19:42:55.6449240Z 4a61bdd4bd4ed730e078aebc7c0fcf046ff29406 external/composable_kernel (4a61bdd) 2025-05-07T19:42:55.6518303Z 6543fec09b2f04ac4a666882998b534afc9c1349 external/cpuinfo (6543fec) 2025-05-07T19:42:55.6585437Z 3ed8d2ec4ba35ef5d9d8353826209b6f868f63d3 external/cutlass (3ed8d2e) 2025-05-07T19:42:55.6646370Z f8d7d77c06936315286eb55f8de22cd23c188571 external/googletest (f8d7d77) 2025-05-07T19:42:55.6716402Z 420084499c7c1e1c2d801922f40df202eac5f3a0 external/hipify_torch (4200844) 2025-05-07T19:42:55.6777764Z 9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03 external/json (9cca280) 2025-05-07T19:42:55.6784075Z ##[group]Cleaning the repository 2025-05-07T19:42:55.6788881Z [command]/usr/bin/git clean -ffdx 2025-05-07T19:42:55.7425063Z Removing amdgpu-install_6.3.60300-1_all.deb 2025-05-07T19:42:55.7425504Z Removing collect_env.py 2025-05-07T19:42:55.7425809Z Removing fbgemm_gpu/_skbuild/ 2025-05-07T19:42:55.7426252Z Removing fbgemm_gpu/bench/verify_fp16_stochastic_benchmark.hip 2025-05-07T19:42:55.7426730Z Removing fbgemm_gpu/codegen/genscript/__pycache__/ 2025-05-07T19:42:55.7427348Z Removing fbgemm_gpu/codegen/inference/embedding_forward_quantized_cpu_template_hip.cpp 2025-05-07T19:42:55.7428104Z Removing fbgemm_gpu/codegen/inference/embedding_forward_quantized_host_cpu_hip.cpp 2025-05-07T19:42:55.7428815Z Removing fbgemm_gpu/codegen/inference/embedding_forward_quantized_host_hip.cpp 2025-05-07T19:42:55.7429833Z Removing fbgemm_gpu/codegen/inference/embedding_forward_quantized_split_lookup.hip 2025-05-07T19:42:55.7430611Z Removing fbgemm_gpu/codegen/inference/embedding_forward_quantized_split_nbit_host_template.hip 2025-05-07T19:42:55.7431448Z Removing fbgemm_gpu/codegen/inference/embedding_forward_quantized_split_nbit_kernel_template.hip 2025-05-07T19:42:55.7432257Z Removing fbgemm_gpu/codegen/training/backward/embedding_backward_dense_host_cpu_hip.cpp 2025-05-07T19:42:55.7433048Z Removing fbgemm_gpu/codegen/training/backward/embedding_backward_split_cpu_approx_template_hip.cpp 2025-05-07T19:42:55.7433890Z Removing fbgemm_gpu/codegen/training/backward/embedding_backward_split_cpu_template_hip.cpp 2025-05-07T19:42:55.7434710Z Removing fbgemm_gpu/codegen/training/backward/embedding_backward_split_device_kernel_template_hip.cuh 2025-05-07T19:42:55.7435676Z Removing fbgemm_gpu/codegen/training/backward/embedding_backward_split_grad_template.hip 2025-05-07T19:42:55.7436457Z Removing fbgemm_gpu/codegen/training/backward/embedding_backward_split_host_cpu_template_hip.cpp 2025-05-07T19:42:55.7437299Z Removing fbgemm_gpu/codegen/training/backward/embedding_backward_split_host_template_hip.cpp 2025-05-07T19:42:55.7438143Z Removing fbgemm_gpu/codegen/training/backward/embedding_backward_split_indice_weights_template.hip 2025-05-07T19:42:55.7439039Z Removing fbgemm_gpu/codegen/training/backward/embedding_backward_split_kernel_cta_template.hip 2025-05-07T19:42:55.7439857Z Removing fbgemm_gpu/codegen/training/backward/embedding_backward_split_kernel_warp_template.hip 2025-05-07T19:42:55.7440686Z Removing fbgemm_gpu/codegen/training/backward/embedding_backward_split_meta_template_hip.cpp 2025-05-07T19:42:55.7441441Z Removing fbgemm_gpu/codegen/training/backward/embedding_backward_split_template.hip 2025-05-07T19:42:55.7442169Z Removing fbgemm_gpu/codegen/training/forward/embedding_forward_split_cpu_hip.cpp 2025-05-07T19:42:55.7442968Z Removing fbgemm_gpu/codegen/training/forward/embedding_forward_split_kernel_nobag_small_template.hip 2025-05-07T19:42:55.7443776Z Removing fbgemm_gpu/codegen/training/forward/embedding_forward_split_kernel_template.hip 2025-05-07T19:42:55.7444565Z Removing fbgemm_gpu/codegen/training/forward/embedding_forward_split_kernel_v2_template.hip 2025-05-07T19:42:55.7445297Z Removing fbgemm_gpu/codegen/training/forward/embedding_forward_split_template.hip 2025-05-07T19:42:55.7446039Z Removing fbgemm_gpu/codegen/training/index_select/batch_index_select_dim0_cpu_host_hip.cpp 2025-05-07T19:42:55.7446754Z Removing fbgemm_gpu/codegen/training/index_select/batch_index_select_dim0_ops_hip.cpp 2025-05-07T19:42:55.7447585Z Removing fbgemm_gpu/codegen/training/optimizer/embedding_optimizer_split_device_kernel_template_hip.cuh 2025-05-07T19:42:55.7448464Z Removing fbgemm_gpu/codegen/training/optimizer/embedding_optimizer_split_host_template_hip.cpp 2025-05-07T19:42:55.7449262Z Removing fbgemm_gpu/codegen/training/optimizer/embedding_optimizer_split_kernel_template.hip 2025-05-07T19:42:55.7450055Z Removing fbgemm_gpu/codegen/training/optimizer/embedding_optimizer_split_template.hip 2025-05-07T19:42:55.7450842Z Removing fbgemm_gpu/codegen/training/pt2/embedding_split_host_pt2_autograd_template_hip.cpp 2025-05-07T19:42:55.7451658Z Removing fbgemm_gpu/codegen/training/pt2/embedding_split_host_pt2_cpu_wrapper_template_hip.cpp 2025-05-07T19:42:55.7452602Z Removing fbgemm_gpu/codegen/training/pt2/embedding_split_host_pt2_hip_wrapper_template.cpp 2025-05-07T19:42:55.7453276Z Removing fbgemm_gpu/codegen/utils/embedding_bounds_check_host_cpu_hip.cpp 2025-05-07T19:42:55.7453883Z Removing fbgemm_gpu/codegen/utils/embedding_bounds_check_host_hip.cpp 2025-05-07T19:42:55.7454590Z Removing fbgemm_gpu/codegen/utils/embedding_bounds_check_v1.hip 2025-05-07T19:42:55.7455221Z Removing fbgemm_gpu/codegen/utils/embedding_bounds_check_v2.hip 2025-05-07T19:42:55.7455686Z Removing fbgemm_gpu/dist/ 2025-05-07T19:42:55.7456168Z Removing fbgemm_gpu/experimental/example/src/cutlass_sgemm_nn.hip 2025-05-07T19:42:55.7456747Z Removing fbgemm_gpu/experimental/example/src/example_nccl_hip.cpp 2025-05-07T19:42:55.7457420Z Removing fbgemm_gpu/experimental/gen_ai/src/attention/gqa_attn_splitk.hip 2025-05-07T19:42:55.7458024Z Removing fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.hip 2025-05-07T19:42:55.7458526Z Removing fbgemm_gpu/experimental/gen_ai/src/comm/car.hip 2025-05-07T19:42:55.7459032Z Removing fbgemm_gpu/experimental/gen_ai/src/comm/car_hip.cpp 2025-05-07T19:42:55.7459621Z Removing fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.hip 2025-05-07T19:42:55.7460206Z Removing fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.hip 2025-05-07T19:42:55.7460777Z Removing fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache_hip.cpp 2025-05-07T19:42:55.7461333Z Removing fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.hip 2025-05-07T19:42:55.7462158Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_common_hip.h 2025-05-07T19:42:55.7463089Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_common_hip.h 2025-05-07T19:42:55.7464105Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_common_hip.h 2025-05-07T19:42:55.7465187Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_common_hip.h 2025-05-07T19:42:55.7466114Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fused_moe/fused_moe_op_hip.cpp 2025-05-07T19:42:55.7466832Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cublas_utils_hip.h 2025-05-07T19:42:55.7467516Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.hip 2025-05-07T19:42:55.7468288Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16.hip 2025-05-07T19:42:55.7469098Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.hip 2025-05-07T19:42:55.7469946Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.hip 2025-05-07T19:42:55.7470739Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16.hip 2025-05-07T19:42:55.7471531Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.hip 2025-05-07T19:42:55.7472449Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.hip 2025-05-07T19:42:55.7473376Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.hip 2025-05-07T19:42:55.7474266Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.hip 2025-05-07T19:42:55.7475588Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.hip 2025-05-07T19:42:55.7476516Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.hip 2025-05-07T19:42:55.7477411Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.hip 2025-05-07T19:42:55.7478343Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.hip 2025-05-07T19:42:55.7515564Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.hip 2025-05-07T19:42:55.7516521Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.hip 2025-05-07T19:42:55.7517439Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.hip 2025-05-07T19:42:55.7518322Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.hip 2025-05-07T19:42:55.7519261Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.hip 2025-05-07T19:42:55.7520327Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.hip 2025-05-07T19:42:55.7521319Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.hip 2025-05-07T19:42:55.7522185Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.hip 2025-05-07T19:42:55.7523012Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.hip 2025-05-07T19:42:55.7523870Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.hip 2025-05-07T19:42:55.7524697Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.hip 2025-05-07T19:42:55.7525666Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.hip 2025-05-07T19:42:55.7526532Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.hip 2025-05-07T19:42:55.7527368Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.hip 2025-05-07T19:42:55.7528225Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.hip 2025-05-07T19:42:55.7529058Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.hip 2025-05-07T19:42:55.7529900Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common_hip.cuh 2025-05-07T19:42:55.7530734Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_manifest_hip.cuh 2025-05-07T19:42:55.7531459Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16.hip 2025-05-07T19:42:55.7532244Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_blockwise.hip 2025-05-07T19:42:55.7532988Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_cublas.hip 2025-05-07T19:42:55.7533714Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_lite.hip 2025-05-07T19:42:55.7534411Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise.hip 2025-05-07T19:42:55.7535580Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.hip 2025-05-07T19:42:55.7536681Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.hip 2025-05-07T19:42:55.7537804Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.hip 2025-05-07T19:42:55.7538920Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.hip 2025-05-07T19:42:55.7540006Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.hip 2025-05-07T19:42:55.7541115Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.hip 2025-05-07T19:42:55.7542187Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.hip 2025-05-07T19:42:55.7543276Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.hip 2025-05-07T19:42:55.7544367Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.hip 2025-05-07T19:42:55.7545401Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_common_hip.cuh 2025-05-07T19:42:55.7546405Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/common_hip.cuh 2025-05-07T19:42:55.7547783Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.hip 2025-05-07T19:42:55.7549028Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.hip 2025-05-07T19:42:55.7550137Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.hip 2025-05-07T19:42:55.7551134Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.hip 2025-05-07T19:42:55.7552151Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.hip 2025-05-07T19:42:55.7553124Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.hip 2025-05-07T19:42:55.7553886Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.hip 2025-05-07T19:42:55.7554646Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_rowwise.hip 2025-05-07T19:42:55.7555377Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled.hip 2025-05-07T19:42:55.7556133Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.hip 2025-05-07T19:42:55.7556883Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16.hip 2025-05-07T19:42:55.7557565Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16_dynamic.hip 2025-05-07T19:42:55.7558404Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/include/fp8_blockwise_cutlass_helpers_hip.h 2025-05-07T19:42:55.7559227Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/mixed_dtype_utils.hip 2025-05-07T19:42:55.7559926Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16_fast_gemv.hip 2025-05-07T19:42:55.7560602Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.hip 2025-05-07T19:42:55.7561273Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.hip 2025-05-07T19:42:55.7561958Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv.hip 2025-05-07T19:42:55.7562605Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv_hip.cuh 2025-05-07T19:42:55.7563294Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/utility_hip.cuh 2025-05-07T19:42:55.7563900Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.hip 2025-05-07T19:42:55.7564415Z Removing fbgemm_gpu/experimental/gen_ai/src/quantize/quantize_hip.cpp 2025-05-07T19:42:55.7564904Z Removing fbgemm_gpu/fbgemm_gpu/docs/version.py 2025-05-07T19:42:55.7565272Z Removing fbgemm_gpu/fbgemm_gpu_nightly.egg-info/ 2025-05-07T19:42:55.7565712Z Removing fbgemm_gpu/include/fbgemm_gpu/cumem_utils_hip.h 2025-05-07T19:42:55.7566245Z Removing fbgemm_gpu/include/fbgemm_gpu/embedding_backward_template_helpers_hip.cuh 2025-05-07T19:42:55.7566866Z Removing fbgemm_gpu/include/fbgemm_gpu/embedding_forward_split_cpu_hip.h 2025-05-07T19:42:55.7567480Z Removing fbgemm_gpu/include/fbgemm_gpu/embedding_forward_template_helpers_hip.cuh 2025-05-07T19:42:55.7568048Z Removing fbgemm_gpu/include/fbgemm_gpu/layout_transform_ops_hip.cuh 2025-05-07T19:42:55.7568632Z Removing fbgemm_gpu/include/fbgemm_gpu/permute_multi_embedding_function_hip.h 2025-05-07T19:42:55.7569154Z Removing fbgemm_gpu/include/fbgemm_gpu/quantize_ops_hip.cuh 2025-05-07T19:42:55.7569631Z Removing fbgemm_gpu/include/fbgemm_gpu/sparse_ops_hip.cuh 2025-05-07T19:42:55.7570144Z Removing fbgemm_gpu/include/fbgemm_gpu/split_embeddings_utils_hip.cuh 2025-05-07T19:42:55.7570682Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/barrier_isolation_hip.cuh 2025-05-07T19:42:55.7571234Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/bench_utils_hip.cuh 2025-05-07T19:42:55.7571762Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/bitonic_sort_hip.cuh 2025-05-07T19:42:55.7572321Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/cub_namespace_postfix_hip.cuh 2025-05-07T19:42:55.7572888Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/cub_namespace_prefix_hip.cuh 2025-05-07T19:42:55.7573478Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/device_cache_flusher_hip.cuh 2025-05-07T19:42:55.7574055Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/device_properties_hip.cuh 2025-05-07T19:42:55.7574576Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/dispatch_macros_hip.h 2025-05-07T19:42:55.7575574Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/embedding_bounds_check_common_hip.cuh 2025-05-07T19:42:55.7576179Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/find_qparams_hip.cuh 2025-05-07T19:42:55.7576899Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/float_hip.cuh 2025-05-07T19:42:55.7577383Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/hip_prelude.cuh 2025-05-07T19:42:55.7577966Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/host_device_buffer_pair_hip.cuh 2025-05-07T19:42:55.7578605Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/inclusive_sum_scan_hip.cuh 2025-05-07T19:42:55.7579171Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/kernel_launcher_hip.cuh 2025-05-07T19:42:55.7579783Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/rocm/stochastic_rounding_hip.h 2025-05-07T19:42:55.7580335Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/rocm/vec2_hip.h 2025-05-07T19:42:55.7580873Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/rocm/weight_row_hip.h 2025-05-07T19:42:55.7581407Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/shared_memory_hip.cuh 2025-05-07T19:42:55.7581994Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding_hip.cuh 2025-05-07T19:42:55.7582624Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/tensor_accessor_builder_hip.h 2025-05-07T19:42:55.7583203Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/tensor_accessor_hip.h 2025-05-07T19:42:55.7583702Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/vec4_hip.cuh 2025-05-07T19:42:55.7584174Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/vec4acc_hip.cuh 2025-05-07T19:42:55.7584702Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/vec_quant_hip.cuh 2025-05-07T19:42:55.7585186Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/vecn_hip.cuh 2025-05-07T19:42:55.7585703Z Removing fbgemm_gpu/include/fbgemm_gpu/utils/weight_row_hip.cuh 2025-05-07T19:42:55.7586298Z Removing fbgemm_gpu/src/dram_kv_embedding_cache/dram_kv_embedding_cache_hip.h 2025-05-07T19:42:55.7586942Z Removing fbgemm_gpu/src/dram_kv_embedding_cache/dram_kv_embedding_cache_wrapper_hip.h 2025-05-07T19:42:55.7587624Z Removing fbgemm_gpu/src/embedding_inplace_ops/embedding_inplace_update.hip 2025-05-07T19:42:55.7588378Z Removing fbgemm_gpu/src/embedding_inplace_ops/embedding_inplace_update_gpu_hip.cpp 2025-05-07T19:42:55.7588968Z Removing fbgemm_gpu/src/histogram_binning_calibration_ops.hip 2025-05-07T19:42:55.7589459Z Removing fbgemm_gpu/src/input_combine_ops/input_combine.hip 2025-05-07T19:42:55.7589939Z Removing fbgemm_gpu/src/input_combine_ops/input_combine_cpu_hip.cpp 2025-05-07T19:42:55.7590562Z Removing fbgemm_gpu/src/intraining_embedding_pruning_ops/intraining_embedding_pruning.hip 2025-05-07T19:42:55.7591270Z Removing fbgemm_gpu/src/intraining_embedding_pruning_ops/intraining_embedding_pruning_gpu_hip.cpp 2025-05-07T19:42:55.7591978Z Removing fbgemm_gpu/src/jagged_tensor_ops/batched_dense_vec_jagged_2d_mul_backward.hip 2025-05-07T19:42:55.7592619Z Removing fbgemm_gpu/src/jagged_tensor_ops/batched_dense_vec_jagged_2d_mul_forward.hip 2025-05-07T19:42:55.7593190Z Removing fbgemm_gpu/src/jagged_tensor_ops/common_hip.cuh 2025-05-07T19:42:55.7593702Z Removing fbgemm_gpu/src/jagged_tensor_ops/dense_to_jagged_forward.hip 2025-05-07T19:42:55.7594228Z Removing fbgemm_gpu/src/jagged_tensor_ops/jagged_dense_bmm_forward.hip 2025-05-07T19:42:55.7594917Z Removing fbgemm_gpu/src/jagged_tensor_ops/jagged_dense_dense_elementwise_add_jagged_output_forward.hip 2025-05-07T19:42:55.7595612Z Removing fbgemm_gpu/src/jagged_tensor_ops/jagged_dense_elementwise_mul_backward.hip 2025-05-07T19:42:55.7596350Z Removing fbgemm_gpu/src/jagged_tensor_ops/jagged_dense_elementwise_mul_forward.hip 2025-05-07T19:42:55.7596973Z Removing fbgemm_gpu/src/jagged_tensor_ops/jagged_index_add_2d_forward.hip 2025-05-07T19:42:55.7597540Z Removing fbgemm_gpu/src/jagged_tensor_ops/jagged_index_select_2d_forward.hip 2025-05-07T19:42:55.7598131Z Removing fbgemm_gpu/src/jagged_tensor_ops/jagged_jagged_bmm_forward.hip 2025-05-07T19:42:55.7598665Z Removing fbgemm_gpu/src/jagged_tensor_ops/jagged_softmax_backward.hip 2025-05-07T19:42:55.7599214Z Removing fbgemm_gpu/src/jagged_tensor_ops/jagged_softmax_forward.hip 2025-05-07T19:42:55.7599714Z Removing fbgemm_gpu/src/jagged_tensor_ops/jagged_tensor_ops.hip 2025-05-07T19:42:55.7600264Z Removing fbgemm_gpu/src/jagged_tensor_ops/jagged_tensor_ops_cpu_hip.cpp 2025-05-07T19:42:55.7601098Z Removing fbgemm_gpu/src/jagged_tensor_ops/jagged_to_padded_dense_backward.hip 2025-05-07T19:42:55.7601679Z Removing fbgemm_gpu/src/jagged_tensor_ops/jagged_to_padded_dense_forward.hip 2025-05-07T19:42:55.7602255Z Removing fbgemm_gpu/src/jagged_tensor_ops/jagged_unique_indices.hip 2025-05-07T19:42:55.7602789Z Removing fbgemm_gpu/src/jagged_tensor_ops/keyed_jagged_index_select_dim1.hip 2025-05-07T19:42:55.7603369Z Removing fbgemm_gpu/src/layout_transform_ops/layout_transform_ops.hip 2025-05-07T19:42:55.7603956Z Removing fbgemm_gpu/src/layout_transform_ops/layout_transform_ops_cpu_hip.cpp 2025-05-07T19:42:55.7604445Z Removing fbgemm_gpu/src/memory_utils/common_hip.cuh 2025-05-07T19:42:55.7604875Z Removing fbgemm_gpu/src/memory_utils/memory_utils.hip 2025-05-07T19:42:55.7605296Z Removing fbgemm_gpu/src/memory_utils/memory_utils_hip.cpp 2025-05-07T19:42:55.7605772Z Removing fbgemm_gpu/src/memory_utils/memory_utils_ops.hip 2025-05-07T19:42:55.7606215Z Removing fbgemm_gpu/src/memory_utils/memory_utils_ops_hip.cpp 2025-05-07T19:42:55.7606790Z Removing fbgemm_gpu/src/merge_pooled_embedding_ops/merge_pooled_embedding_ops_cpu_hip.cpp 2025-05-07T19:42:55.7607474Z Removing fbgemm_gpu/src/merge_pooled_embedding_ops/merge_pooled_embedding_ops_gpu_hip.cpp 2025-05-07T19:42:55.7608001Z Removing fbgemm_gpu/src/metric_ops/metric_ops.hip 2025-05-07T19:42:55.7608562Z Removing fbgemm_gpu/src/permute_multi_embedding_ops/permute_multi_embedding_function_hip.cpp 2025-05-07T19:42:55.7609225Z Removing fbgemm_gpu/src/permute_multi_embedding_ops/permute_multi_embedding_ops.hip 2025-05-07T19:42:55.7609910Z Removing fbgemm_gpu/src/permute_multi_embedding_ops/permute_multi_embedding_ops_cpu_hip.cpp 2025-05-07T19:42:55.7610578Z Removing fbgemm_gpu/src/permute_pooled_embedding_ops/permute_pooled_embedding_ops.hip 2025-05-07T19:42:55.7611284Z Removing fbgemm_gpu/src/permute_pooled_embedding_ops/permute_pooled_embedding_ops_cpu_hip.cpp 2025-05-07T19:42:55.7612000Z Removing fbgemm_gpu/src/permute_pooled_embedding_ops/permute_pooled_embedding_ops_split.hip 2025-05-07T19:42:55.7612673Z Removing fbgemm_gpu/src/ps_split_embeddings_cache/ps_split_table_batched_embeddings_hip.cpp 2025-05-07T19:42:55.7613345Z Removing fbgemm_gpu/src/ps_split_embeddings_cache/ps_table_batched_embeddings_hip.h 2025-05-07T19:42:55.7613879Z Removing fbgemm_gpu/src/quantize_ops/common_hip.cuh 2025-05-07T19:42:55.7614468Z Removing fbgemm_gpu/src/quantize_ops/mx/common_hip.cuh 2025-05-07T19:42:55.7614982Z Removing fbgemm_gpu/src/quantize_ops/mx_common_hip.cuh 2025-05-07T19:42:55.7615598Z Removing fbgemm_gpu/src/quantize_ops/quantize_bfloat16.hip 2025-05-07T19:42:55.7616206Z Removing fbgemm_gpu/src/quantize_ops/quantize_fp8_rowwise.hip 2025-05-07T19:42:55.7617028Z Removing fbgemm_gpu/src/quantize_ops/quantize_fused_8bit_rowwise.hip 2025-05-07T19:42:55.7617778Z Removing fbgemm_gpu/src/quantize_ops/quantize_fused_nbit_rowwise.hip 2025-05-07T19:42:55.7618326Z Removing fbgemm_gpu/src/quantize_ops/quantize_hfp8.hip 2025-05-07T19:42:55.7618778Z Removing fbgemm_gpu/src/quantize_ops/quantize_msfp.hip 2025-05-07T19:42:55.7619237Z Removing fbgemm_gpu/src/quantize_ops/quantize_mx.hip 2025-05-07T19:42:55.7619674Z Removing fbgemm_gpu/src/quantize_ops/quantize_mx_hip.cuh 2025-05-07T19:42:55.7620248Z Removing fbgemm_gpu/src/quantize_ops/quantize_ops_cpu_hip.cpp 2025-05-07T19:42:55.7620787Z Removing fbgemm_gpu/src/quantize_ops/quantize_padded_fp8_rowwise.hip 2025-05-07T19:42:55.7621313Z Removing fbgemm_gpu/src/sparse_ops/common_hip.cuh 2025-05-07T19:42:55.7621817Z Removing fbgemm_gpu/src/sparse_ops/sparse_async_batched_cumsum.hip 2025-05-07T19:42:55.7622376Z Removing fbgemm_gpu/src/sparse_ops/sparse_async_batched_cumsum_hip.cpp 2025-05-07T19:42:55.7623094Z Removing fbgemm_gpu/src/sparse_ops/sparse_async_cumsum.hip 2025-05-07T19:42:55.7623755Z Removing fbgemm_gpu/src/sparse_ops/sparse_async_cumsum_hip.cpp 2025-05-07T19:42:55.7624326Z Removing fbgemm_gpu/src/sparse_ops/sparse_batched_unary_embeddings.hip 2025-05-07T19:42:55.7624993Z Removing fbgemm_gpu/src/sparse_ops/sparse_block_bucketize_features.hip 2025-05-07T19:42:55.7625539Z Removing fbgemm_gpu/src/sparse_ops/sparse_bucketize_features.hip 2025-05-07T19:42:55.7626127Z Removing fbgemm_gpu/src/sparse_ops/sparse_compute_frequency_sequence.hip 2025-05-07T19:42:55.7626726Z Removing fbgemm_gpu/src/sparse_ops/sparse_expand_into_jagged_permute.hip 2025-05-07T19:42:55.7627299Z Removing fbgemm_gpu/src/sparse_ops/sparse_group_index.hip 2025-05-07T19:42:55.7627856Z Removing fbgemm_gpu/src/sparse_ops/sparse_index_add.hip 2025-05-07T19:42:55.7628327Z Removing fbgemm_gpu/src/sparse_ops/sparse_index_select.hip 2025-05-07T19:42:55.7628827Z Removing fbgemm_gpu/src/sparse_ops/sparse_invert_permute.hip 2025-05-07T19:42:55.7629287Z Removing fbgemm_gpu/src/sparse_ops/sparse_ops_cpu_hip.cpp 2025-05-07T19:42:55.7629808Z Removing fbgemm_gpu/src/sparse_ops/sparse_pack_segments_backward.hip 2025-05-07T19:42:55.7630343Z Removing fbgemm_gpu/src/sparse_ops/sparse_pack_segments_forward.hip 2025-05-07T19:42:55.7630867Z Removing fbgemm_gpu/src/sparse_ops/sparse_permute102.hip 2025-05-07T19:42:55.7631312Z Removing fbgemm_gpu/src/sparse_ops/sparse_permute_1d.hip 2025-05-07T19:42:55.7631776Z Removing fbgemm_gpu/src/sparse_ops/sparse_permute_2d.hip 2025-05-07T19:42:55.7632279Z Removing fbgemm_gpu/src/sparse_ops/sparse_permute_embeddings.hip 2025-05-07T19:42:55.7632735Z Removing fbgemm_gpu/src/sparse_ops/sparse_range.hip 2025-05-07T19:42:55.7633211Z Removing fbgemm_gpu/src/sparse_ops/sparse_reorder_batched_ad.hip 2025-05-07T19:42:55.7633698Z Removing fbgemm_gpu/src/sparse_ops/sparse_segment_sum_csr.hip 2025-05-07T19:42:55.7634165Z Removing fbgemm_gpu/src/sparse_ops/sparse_zipf.hip 2025-05-07T19:42:55.7634638Z Removing fbgemm_gpu/src/split_embeddings_cache/cachelib_cache_hip.cpp 2025-05-07T19:42:55.7635181Z Removing fbgemm_gpu/src/split_embeddings_cache/common_hip.cuh 2025-05-07T19:42:55.7635676Z Removing fbgemm_gpu/src/split_embeddings_cache/common_hip.h 2025-05-07T19:42:55.7636160Z Removing fbgemm_gpu/src/split_embeddings_cache/lfu_cache_find.hip 2025-05-07T19:42:55.7636712Z Removing fbgemm_gpu/src/split_embeddings_cache/lfu_cache_populate.hip 2025-05-07T19:42:55.7637274Z Removing fbgemm_gpu/src/split_embeddings_cache/lfu_cache_populate_byte.hip 2025-05-07T19:42:55.7637895Z Removing fbgemm_gpu/src/split_embeddings_cache/lfu_cache_populate_byte_hip.cpp 2025-05-07T19:42:55.7638486Z Removing fbgemm_gpu/src/split_embeddings_cache/linearize_cache_indices.hip 2025-05-07T19:42:55.7639103Z Removing fbgemm_gpu/src/split_embeddings_cache/linearize_cache_indices_hip.cpp 2025-05-07T19:42:55.7639688Z Removing fbgemm_gpu/src/split_embeddings_cache/lru_cache_find.hip 2025-05-07T19:42:55.7640213Z Removing fbgemm_gpu/src/split_embeddings_cache/lru_cache_populate.hip 2025-05-07T19:42:55.7640902Z Removing fbgemm_gpu/src/split_embeddings_cache/lru_cache_populate_byte.hip 2025-05-07T19:42:55.7641468Z Removing fbgemm_gpu/src/split_embeddings_cache/lru_cache_populate_byte_hip.cpp 2025-05-07T19:42:55.7642014Z Removing fbgemm_gpu/src/split_embeddings_cache/lxu_cache.hip 2025-05-07T19:42:55.7642481Z Removing fbgemm_gpu/src/split_embeddings_cache/lxu_cache_hip.cpp 2025-05-07T19:42:55.7643019Z Removing fbgemm_gpu/src/split_embeddings_cache/reset_weight_momentum.hip 2025-05-07T19:42:55.7643688Z Removing fbgemm_gpu/src/split_embeddings_cache/split_embeddings_cache_ops.hip 2025-05-07T19:42:55.7644287Z Removing fbgemm_gpu/src/split_embeddings_cache/split_embeddings_cache_ops_hip.cpp 2025-05-07T19:42:55.7644898Z Removing fbgemm_gpu/src/split_embeddings_utils/generate_vbe_metadata.hip 2025-05-07T19:42:55.7645429Z Removing fbgemm_gpu/src/split_embeddings_utils/get_infos_metadata.hip 2025-05-07T19:42:55.7645973Z Removing fbgemm_gpu/src/split_embeddings_utils/radix_sort_pairs.hip 2025-05-07T19:42:55.7646548Z Removing fbgemm_gpu/src/split_embeddings_utils/split_embeddings_utils_hip.cpp 2025-05-07T19:42:55.7647131Z Removing fbgemm_gpu/src/split_embeddings_utils/transpose_embedding_input.hip 2025-05-07T19:42:55.7647757Z Removing fbgemm_gpu/src/ssd_split_embeddings_cache/embedding_rocksdb_wrapper_hip.h 2025-05-07T19:42:55.7648386Z Removing fbgemm_gpu/src/ssd_split_embeddings_cache/kv_db_hip_utils.cpp 2025-05-07T19:42:55.7648928Z Removing fbgemm_gpu/src/ssd_split_embeddings_cache/kv_db_hip_utils.h 2025-05-07T19:42:55.7649510Z Removing fbgemm_gpu/src/ssd_split_embeddings_cache/kv_db_table_batched_embeddings_hip.cpp 2025-05-07T19:42:55.7650197Z Removing fbgemm_gpu/src/ssd_split_embeddings_cache/kv_db_table_batched_embeddings_hip.h 2025-05-07T19:42:55.7650843Z Removing fbgemm_gpu/src/ssd_split_embeddings_cache/kv_tensor_wrapper_cpu_hip.cpp 2025-05-07T19:42:55.7651470Z Removing fbgemm_gpu/src/ssd_split_embeddings_cache/ssd_scratch_pad_indices_queue_hip.cpp 2025-05-07T19:42:55.7652137Z Removing fbgemm_gpu/src/ssd_split_embeddings_cache/ssd_split_embeddings_cache_hip.hip 2025-05-07T19:42:55.7652802Z Removing fbgemm_gpu/src/ssd_split_embeddings_cache/ssd_split_table_batched_embeddings_hip.cpp 2025-05-07T19:42:55.7653493Z Removing fbgemm_gpu/src/ssd_split_embeddings_cache/ssd_table_batched_embeddings_hip.h 2025-05-07T19:42:55.7654036Z Removing fbgemm_gpu/src/topology_utils_hip.cpp 2025-05-07T19:42:55.7654444Z Removing fbgemm_gpu/test/tbe/utils/cpu_kernel_test_hip.cpp 2025-05-07T19:42:55.7654976Z Removing fbgemm_gpu/test/utils/kernel_launcher_test.hip 2025-05-07T19:42:55.7655598Z Removing fbgemm_gpu/test/utils/stochastic_rounding_test.hip 2025-05-07T19:42:55.7656111Z Removing fbgemm_gpu/test/utils/tensor_accessor2_test.hip 2025-05-07T19:42:55.7656597Z Removing fbgemm_gpu/test/utils/tensor_accessor_builder_test.hip 2025-05-07T19:42:55.7657196Z Removing fbgemm_gpu/test/utils/tensor_accessor_builder_with_memcheck_test.hip 2025-05-07T19:42:55.7657776Z Removing fbgemm_gpu/test/utils/tensor_accessor_test.hip 2025-05-07T19:42:55.7658281Z Removing fbgemm_gpu/test/utils/tensor_accessor_with_memcheck_test.hip 2025-05-07T19:42:55.7658804Z Removing fbgemm_gpu/test/utils/weight_row_test.hip 2025-05-07T19:42:55.7661461Z [command]/usr/bin/git reset --hard HEAD 2025-05-07T19:42:55.8564779Z HEAD is now at 1c9ad64 Merge f6528e7b1e8f5602e7dba30cd73b48ae6630981c into fd4df5f456e0cca514bacd98a39efb72990fd9f4 2025-05-07T19:42:55.8568259Z ##[endgroup] 2025-05-07T19:42:55.8569433Z ##[group]Disabling automatic garbage collection 2025-05-07T19:42:55.8572844Z [command]/usr/bin/git config --local gc.auto 0 2025-05-07T19:42:55.8602171Z ##[endgroup] 2025-05-07T19:42:55.8603238Z ##[group]Setting up auth 2025-05-07T19:42:55.8604482Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-05-07T19:42:55.8629457Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-05-07T19:42:55.8958272Z Entering 'external/asmjit' 2025-05-07T19:42:55.9008661Z Entering 'external/composable_kernel' 2025-05-07T19:42:55.9060707Z Entering 'external/cpuinfo' 2025-05-07T19:42:55.9114627Z Entering 'external/cutlass' 2025-05-07T19:42:55.9171070Z Entering 'external/googletest' 2025-05-07T19:42:55.9217308Z Entering 'external/hipify_torch' 2025-05-07T19:42:55.9263001Z Entering 'external/json' 2025-05-07T19:42:55.9337091Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-05-07T19:42:55.9360660Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-05-07T19:42:55.9626139Z Entering 'external/asmjit' 2025-05-07T19:42:55.9675466Z Entering 'external/composable_kernel' 2025-05-07T19:42:55.9729171Z Entering 'external/cpuinfo' 2025-05-07T19:42:55.9776170Z Entering 'external/cutlass' 2025-05-07T19:42:55.9831052Z Entering 'external/googletest' 2025-05-07T19:42:55.9876729Z Entering 'external/hipify_torch' 2025-05-07T19:42:55.9927891Z Entering 'external/json' 2025-05-07T19:42:55.9989784Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-05-07T19:42:56.0032691Z ##[endgroup] 2025-05-07T19:42:56.0033694Z ##[group]Fetching the repository 2025-05-07T19:42:56.0036783Z [command]/usr/bin/git -c protocol.version=2 fetch --no-tags --prune --no-recurse-submodules --depth=1 origin +a2f4c52051596e74bc8c16e3d2867a4ecdd271e0:refs/remotes/pull/4066/merge 2025-05-07T19:42:56.1477839Z From https://github.com/pytorch/FBGEMM 2025-05-07T19:42:56.1479530Z + 1c9ad64...a2f4c52 a2f4c52051596e74bc8c16e3d2867a4ecdd271e0 -> pull/4066/merge (forced update) 2025-05-07T19:42:56.1496494Z ##[endgroup] 2025-05-07T19:42:56.1497606Z ##[group]Determining the checkout info 2025-05-07T19:42:56.1498975Z ##[endgroup] 2025-05-07T19:42:56.1499717Z [command]/usr/bin/git sparse-checkout disable 2025-05-07T19:42:56.1989764Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig 2025-05-07T19:42:56.2017362Z ##[group]Checking out the ref 2025-05-07T19:42:56.2018679Z [command]/usr/bin/git checkout --progress --force refs/remotes/pull/4066/merge 2025-05-07T19:42:56.2088397Z Warning: you are leaving 1 commit behind, not connected to 2025-05-07T19:42:56.2089071Z any of your branches: 2025-05-07T19:42:56.2089226Z 2025-05-07T19:42:56.2089605Z 1c9ad64 Merge f6528e7b1e8f5602e7dba30cd73b48ae6630981c into fd4df5f456e0cca514bacd98a39efb72990fd9f4 2025-05-07T19:42:56.2090179Z 2025-05-07T19:42:56.2090376Z If you want to keep it by creating a new branch, this may be a good time 2025-05-07T19:42:56.2090990Z to do so with: 2025-05-07T19:42:56.2091260Z 2025-05-07T19:42:56.2091381Z git branch 1c9ad64 2025-05-07T19:42:56.2091582Z 2025-05-07T19:42:56.2091990Z HEAD is now at a2f4c52 Merge 6060cd4b5f971680caecdcc657faccb5720d1c3e into fd4df5f456e0cca514bacd98a39efb72990fd9f4 2025-05-07T19:42:56.2093135Z ##[endgroup] 2025-05-07T19:42:56.2093545Z ##[group]Setting up auth for fetching submodules 2025-05-07T19:42:56.2096481Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-05-07T19:42:56.2137379Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2025-05-07T19:42:56.2156983Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2025-05-07T19:42:56.2180039Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2025-05-07T19:42:56.2201672Z ##[endgroup] 2025-05-07T19:42:56.2202271Z ##[group]Fetching submodules 2025-05-07T19:42:56.2203094Z [command]/usr/bin/git submodule sync 2025-05-07T19:42:56.2497964Z Synchronizing submodule url for 'external/asmjit' 2025-05-07T19:42:56.2498480Z Synchronizing submodule url for 'external/composable_kernel' 2025-05-07T19:42:56.2498998Z Synchronizing submodule url for 'external/cpuinfo' 2025-05-07T19:42:56.2499495Z Synchronizing submodule url for 'external/cutlass' 2025-05-07T19:42:56.2499960Z Synchronizing submodule url for 'external/googletest' 2025-05-07T19:42:56.2500408Z Synchronizing submodule url for 'external/hipify_torch' 2025-05-07T19:42:56.2500899Z Synchronizing submodule url for 'external/json' 2025-05-07T19:42:56.2506587Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --depth=1 2025-05-07T19:42:56.3221709Z Submodule path 'external/asmjit': checked out 'e5d7c0bd5d9aec44d68830187138149e6a8c4e32' 2025-05-07T19:42:56.5883380Z Submodule path 'external/composable_kernel': checked out '4a61bdd4bd4ed730e078aebc7c0fcf046ff29406' 2025-05-07T19:42:56.6817723Z Submodule path 'external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349' 2025-05-07T19:42:57.3429151Z Submodule path 'external/cutlass': checked out '3ed8d2ec4ba35ef5d9d8353826209b6f868f63d3' 2025-05-07T19:42:57.3825799Z Submodule path 'external/googletest': checked out 'f8d7d77c06936315286eb55f8de22cd23c188571' 2025-05-07T19:42:57.3914163Z Submodule path 'external/hipify_torch': checked out '420084499c7c1e1c2d801922f40df202eac5f3a0' 2025-05-07T19:42:57.4967762Z Submodule path 'external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03' 2025-05-07T19:42:57.4976520Z [command]/usr/bin/git submodule foreach git config --local gc.auto 0 2025-05-07T19:42:57.5253827Z Entering 'external/asmjit' 2025-05-07T19:42:57.5277605Z Entering 'external/composable_kernel' 2025-05-07T19:42:57.5310895Z Entering 'external/cpuinfo' 2025-05-07T19:42:57.5337041Z Entering 'external/cutlass' 2025-05-07T19:42:57.5369388Z Entering 'external/googletest' 2025-05-07T19:42:57.5398749Z Entering 'external/hipify_torch' 2025-05-07T19:42:57.5432417Z Entering 'external/json' 2025-05-07T19:42:57.5463036Z ##[endgroup] 2025-05-07T19:42:57.5463527Z ##[group]Persisting credentials for submodules 2025-05-07T19:42:57.5469294Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :" 2025-05-07T19:42:57.5736987Z Entering 'external/asmjit' 2025-05-07T19:42:57.5778388Z url.https://github.com/.insteadof 2025-05-07T19:42:57.5778819Z url.https://github.com/.insteadof 2025-05-07T19:42:57.5814700Z Entering 'external/composable_kernel' 2025-05-07T19:42:57.5858299Z url.https://github.com/.insteadof 2025-05-07T19:42:57.5859152Z url.https://github.com/.insteadof 2025-05-07T19:42:57.5900406Z Entering 'external/cpuinfo' 2025-05-07T19:42:57.5947654Z url.https://github.com/.insteadof 2025-05-07T19:42:57.5948192Z url.https://github.com/.insteadof 2025-05-07T19:42:57.5978508Z Entering 'external/cutlass' 2025-05-07T19:42:57.6018080Z url.https://github.com/.insteadof 2025-05-07T19:42:57.6018487Z url.https://github.com/.insteadof 2025-05-07T19:42:57.6059482Z Entering 'external/googletest' 2025-05-07T19:42:57.6098741Z url.https://github.com/.insteadof 2025-05-07T19:42:57.6099775Z url.https://github.com/.insteadof 2025-05-07T19:42:57.6135011Z Entering 'external/hipify_torch' 2025-05-07T19:42:57.6167703Z url.https://github.com/.insteadof 2025-05-07T19:42:57.6168149Z url.https://github.com/.insteadof 2025-05-07T19:42:57.6201074Z Entering 'external/json' 2025-05-07T19:42:57.6240542Z url.https://github.com/.insteadof 2025-05-07T19:42:57.6241063Z url.https://github.com/.insteadof 2025-05-07T19:42:57.6294048Z [command]/usr/bin/git submodule foreach sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url" 2025-05-07T19:42:57.6597963Z Entering 'external/asmjit' 2025-05-07T19:42:57.6653100Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/asmjit/config remote.origin.url 2025-05-07T19:42:57.6654594Z Entering 'external/composable_kernel' 2025-05-07T19:42:57.6699467Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/composable_kernel/config remote.origin.url 2025-05-07T19:42:57.6700060Z Entering 'external/cpuinfo' 2025-05-07T19:42:57.6746115Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/cpuinfo/config remote.origin.url 2025-05-07T19:42:57.6747536Z Entering 'external/cutlass' 2025-05-07T19:42:57.6796835Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/cutlass/config remote.origin.url 2025-05-07T19:42:57.6798068Z Entering 'external/googletest' 2025-05-07T19:42:57.6844137Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/googletest/config remote.origin.url 2025-05-07T19:42:57.6845673Z Entering 'external/hipify_torch' 2025-05-07T19:42:57.6894295Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/hipify_torch/config remote.origin.url 2025-05-07T19:42:57.6896049Z Entering 'external/json' 2025-05-07T19:42:57.6937807Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/json/config remote.origin.url 2025-05-07T19:42:57.7050756Z [command]/usr/bin/git submodule foreach git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2025-05-07T19:42:57.7328495Z Entering 'external/asmjit' 2025-05-07T19:42:57.7349774Z Entering 'external/composable_kernel' 2025-05-07T19:42:57.7375191Z Entering 'external/cpuinfo' 2025-05-07T19:42:57.7408146Z Entering 'external/cutlass' 2025-05-07T19:42:57.7436825Z Entering 'external/googletest' 2025-05-07T19:42:57.7467128Z Entering 'external/hipify_torch' 2025-05-07T19:42:57.7496682Z Entering 'external/json' 2025-05-07T19:42:57.7546348Z [command]/usr/bin/git submodule foreach git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2025-05-07T19:42:57.7844661Z Entering 'external/asmjit' 2025-05-07T19:42:57.7863008Z Entering 'external/composable_kernel' 2025-05-07T19:42:57.7899472Z Entering 'external/cpuinfo' 2025-05-07T19:42:57.7927902Z Entering 'external/cutlass' 2025-05-07T19:42:57.7958505Z Entering 'external/googletest' 2025-05-07T19:42:57.7984267Z Entering 'external/hipify_torch' 2025-05-07T19:42:57.8017239Z Entering 'external/json' 2025-05-07T19:42:57.8058761Z ##[endgroup] 2025-05-07T19:42:57.8095392Z [command]/usr/bin/git log -1 --format=%H 2025-05-07T19:42:57.8117824Z a2f4c52051596e74bc8c16e3d2867a4ecdd271e0 2025-05-07T19:42:57.8263329Z ##[group]Run . $PRELUDE; print_system_info 2025-05-07T19:42:57.8263759Z . $PRELUDE; print_system_info 2025-05-07T19:42:57.8264294Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:42:57.8264686Z env: 2025-05-07T19:42:57.8264927Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:42:57.8265280Z BUILD_ENV: build_binary 2025-05-07T19:42:57.8265544Z BUILD_TARGET: genai 2025-05-07T19:42:57.8265816Z BUILD_VARIANT: cuda 2025-05-07T19:42:57.8266070Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:42:57.8266370Z ##[endgroup] 2025-05-07T19:42:58.2706793Z ################################################################################ 2025-05-07T19:42:58.2707629Z # Print System Info 2025-05-07T19:42:58.2707985Z # 2025-05-07T19:42:58.2723518Z # [2025-05-07T19:42:58.271Z] + print_system_info 2025-05-07T19:42:58.2724667Z ################################################################################ 2025-05-07T19:42:58.2725350Z 2025-05-07T19:42:58.2725876Z ################################################################################ 2025-05-07T19:42:58.2726281Z [INFO] Printing environment variables ... 2025-05-07T19:42:58.2726635Z + printenv 2025-05-07T19:42:58.2726783Z 2025-05-07T19:42:58.2736300Z GITHUB_WORKSPACE=/__w/FBGEMM/FBGEMM 2025-05-07T19:42:58.2737328Z BUILD_VARIANT=cuda 2025-05-07T19:42:58.2737899Z HOSTNAME=619e625bd71e 2025-05-07T19:42:58.2738349Z GITHUB_PATH=/__w/_temp/_runner_file_commands/add_path_f9e83ba3-e620-4bf5-9b38-cc3fd29d25d2 2025-05-07T19:42:58.2738876Z GITHUB_ACTION=__run_2 2025-05-07T19:42:58.2739140Z GITHUB_RUN_NUMBER=10601 2025-05-07T19:42:58.2739429Z RUNNER_NAME=i-02bb5f24b31fe2d00 2025-05-07T19:42:58.2739727Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-05-07T19:42:58.2740078Z PLATFORM_NAME_LC=linux-x86_64 2025-05-07T19:42:58.2740368Z MACHINE_NAME_LC=x86_64 2025-05-07T19:42:58.2740657Z GITHUB_TRIGGERING_ACTOR=q10 2025-05-07T19:42:58.2740985Z PRELUDE=.github/scripts/setup_env.bash 2025-05-07T19:42:58.2741303Z GITHUB_REF_TYPE=branch 2025-05-07T19:42:58.2741869Z *** 2025-05-07T19:42:58.2742101Z GITHUB_REPOSITORY_ID=150154628 2025-05-07T19:42:58.2742417Z GITHUB_ACTIONS=true 2025-05-07T19:42:58.2742707Z GITHUB_SHA=a2f4c52051596e74bc8c16e3d2867a4ecdd271e0 2025-05-07T19:42:58.2743323Z GITHUB_WORKFLOW_REF=pytorch/FBGEMM/.github/workflows/fbgemm_gpu_ci_cuda.yml@refs/pull/4066/merge 2025-05-07T19:42:58.2743989Z RUNNER_ENVIRONMENT=self-hosted 2025-05-07T19:42:58.2744297Z GITHUB_REF=refs/pull/4066/merge 2025-05-07T19:42:58.2744588Z RUNNER_OS=Linux 2025-05-07T19:42:58.2744823Z GITHUB_REF_PROTECTED=false 2025-05-07T19:42:58.2745109Z HOME=/github/home 2025-05-07T19:42:58.2745371Z GITHUB_API_URL=https://api.github.com 2025-05-07T19:42:58.2745700Z RUNNER_ARCH=X64 2025-05-07T19:42:58.2745932Z RUNNER_TEMP=/__w/_temp 2025-05-07T19:42:58.2746205Z BUILD_TARGET=genai 2025-05-07T19:42:58.2746627Z GITHUB_STATE=/__w/_temp/_runner_file_commands/save_state_f9e83ba3-e620-4bf5-9b38-cc3fd29d25d2 2025-05-07T19:42:58.2747301Z GITHUB_ENV=/__w/_temp/_runner_file_commands/set_env_f9e83ba3-e620-4bf5-9b38-cc3fd29d25d2 2025-05-07T19:42:58.2747829Z GITHUB_EVENT_PATH=/github/workflow/event.json 2025-05-07T19:42:58.2748169Z GITHUB_EVENT_NAME=pull_request 2025-05-07T19:42:58.2748451Z GITHUB_RUN_ID=14891846252 2025-05-07T19:42:58.2749293Z GITHUB_STEP_SUMMARY=/__w/_temp/_runner_file_commands/step_summary_f9e83ba3-e620-4bf5-9b38-cc3fd29d25d2 2025-05-07T19:42:58.2749862Z BUILD_ENV=build_binary 2025-05-07T19:42:58.2750118Z GITHUB_ACTOR=q10 2025-05-07T19:42:58.2750386Z GITHUB_RUN_ATTEMPT=1 2025-05-07T19:42:58.2750640Z KERN_NAME_LC=linux 2025-05-07T19:42:58.2750913Z BUILD_CUDA_VERSION=12.8.0 2025-05-07T19:42:58.2751412Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-05-07T19:42:58.2751809Z PLATFORM_NAME=Linux-x86_64 2025-05-07T19:42:58.2752103Z GITHUB_SERVER_URL=https://github.com 2025-05-07T19:42:58.2752439Z SHLVL=1 2025-05-07T19:42:58.2752656Z GITHUB_ACTOR_ID=255046 2025-05-07T19:42:58.2752953Z RUNNER_TOOL_CACHE=/__w/_tool 2025-05-07T19:42:58.2753477Z GITHUB_WORKFLOW_SHA=6060cd4b5f971680caecdcc657faccb5720d1c3e 2025-05-07T19:42:58.2753912Z GITHUB_REF_NAME=4066/merge 2025-05-07T19:42:58.2754215Z KERN_NAME=Linux 2025-05-07T19:42:58.2754466Z GITHUB_JOB=build_artifact 2025-05-07T19:42:58.2754786Z GITHUB_REPOSITORY=pytorch/FBGEMM 2025-05-07T19:42:58.2755094Z GITHUB_RETENTION_DAYS=90 2025-05-07T19:42:58.2755401Z RUNNER_WORKSPACE=/__w/FBGEMM 2025-05-07T19:42:58.2755686Z GITHUB_ACTION_REPOSITORY= 2025-05-07T19:42:58.2756090Z PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-05-07T19:42:58.2756504Z GITHUB_BASE_REF=main 2025-05-07T19:42:58.2756780Z CI=true 2025-05-07T19:42:58.2757019Z GITHUB_REPOSITORY_OWNER=pytorch 2025-05-07T19:42:58.2757352Z GITHUB_HEAD_REF=bm/genai-rocm-oss-6 2025-05-07T19:42:58.2757678Z GITHUB_ACTION_REF= 2025-05-07T19:42:58.2757941Z GITHUB_WORKFLOW=FBGEMM GPU/GenAI CUDA CI 2025-05-07T19:42:58.2758593Z GITHUB_OUTPUT=/__w/_temp/_runner_file_commands/set_output_f9e83ba3-e620-4bf5-9b38-cc3fd29d25d2 2025-05-07T19:42:58.2759083Z MACHINE_NAME=x86_64 2025-05-07T19:42:58.2759338Z _=/usr/bin/printenv 2025-05-07T19:42:58.2759479Z 2025-05-07T19:42:58.2759601Z ################################################################################ 2025-05-07T19:42:58.2759955Z [INFO] Print ldd version ... 2025-05-07T19:42:58.2760229Z + ldd --version 2025-05-07T19:42:58.2760397Z 2025-05-07T19:42:58.2760510Z ldd (GNU libc) 2.34 2025-05-07T19:42:58.2760816Z Copyright (C) 2021 Free Software Foundation, Inc. 2025-05-07T19:42:58.2761277Z This is free software; see the source for copying conditions. There is NO 2025-05-07T19:42:58.2761861Z warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 2025-05-07T19:42:58.2762336Z Written by Roland McGrath and Ulrich Drepper. 2025-05-07T19:42:58.2762592Z 2025-05-07T19:42:58.2762723Z ################################################################################ 2025-05-07T19:42:58.2763079Z [INFO] Print CPU info ... 2025-05-07T19:42:58.2763333Z + nproc 2025-05-07T19:42:58.2763453Z 2025-05-07T19:42:58.2765769Z 96 2025-05-07T19:42:58.2765890Z 2025-05-07T19:42:58.2766477Z + lscpu 2025-05-07T19:42:58.2766619Z 2025-05-07T19:42:58.3026463Z Architecture: x86_64 2025-05-07T19:42:58.3027658Z CPU op-mode(s): 32-bit, 64-bit 2025-05-07T19:42:58.3028449Z Address sizes: 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3028901Z Byte Order: Little Endian 2025-05-07T19:42:58.3029252Z CPU(s): 96 2025-05-07T19:42:58.3029593Z On-line CPU(s) list: 0-95 2025-05-07T19:42:58.3030059Z Vendor ID: GenuineIntel 2025-05-07T19:42:58.3030610Z Model name: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3031023Z CPU family: 6 2025-05-07T19:42:58.3031315Z Model: 85 2025-05-07T19:42:58.3031635Z Thread(s) per core: 2 2025-05-07T19:42:58.3031938Z Core(s) per socket: 24 2025-05-07T19:42:58.3032255Z Socket(s): 2 2025-05-07T19:42:58.3032538Z Stepping: 7 2025-05-07T19:42:58.3032860Z BogoMIPS: 6000.01 2025-05-07T19:42:58.3035368Z Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3037681Z Hypervisor vendor: KVM 2025-05-07T19:42:58.3038123Z Virtualization type: full 2025-05-07T19:42:58.3038496Z L1d cache: 1.5 MiB (48 instances) 2025-05-07T19:42:58.3038867Z L1i cache: 1.5 MiB (48 instances) 2025-05-07T19:42:58.3039264Z L2 cache: 48 MiB (48 instances) 2025-05-07T19:42:58.3039628Z L3 cache: 71.5 MiB (2 instances) 2025-05-07T19:42:58.3039992Z NUMA node(s): 2 2025-05-07T19:42:58.3040330Z NUMA node0 CPU(s): 0-23,48-71 2025-05-07T19:42:58.3040666Z NUMA node1 CPU(s): 24-47,72-95 2025-05-07T19:42:58.3041143Z Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status 2025-05-07T19:42:58.3041863Z Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported 2025-05-07T19:42:58.3042393Z Vulnerability L1tf: Mitigation; PTE Inversion 2025-05-07T19:42:58.3043196Z Vulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown 2025-05-07T19:42:58.3043845Z Vulnerability Meltdown: Mitigation; PTI 2025-05-07T19:42:58.3044517Z Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown 2025-05-07T19:42:58.3045168Z Vulnerability Reg file data sampling: Not affected 2025-05-07T19:42:58.3045602Z Vulnerability Retbleed: Vulnerable 2025-05-07T19:42:58.3045996Z Vulnerability Spec rstack overflow: Not affected 2025-05-07T19:42:58.3046437Z Vulnerability Spec store bypass: Vulnerable 2025-05-07T19:42:58.3047025Z Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization 2025-05-07T19:42:58.3047922Z Vulnerability Spectre v2: Mitigation; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Retpoline 2025-05-07T19:42:58.3048628Z Vulnerability Srbds: Not affected 2025-05-07T19:42:58.3049111Z Vulnerability Tsx async abort: Not affected 2025-05-07T19:42:58.3049411Z 2025-05-07T19:42:58.3049524Z + cat /proc/cpuinfo 2025-05-07T19:42:58.3049678Z 2025-05-07T19:42:58.3049965Z processor : 0 2025-05-07T19:42:58.3050231Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3050534Z cpu family : 6 2025-05-07T19:42:58.3050769Z model : 85 2025-05-07T19:42:58.3051111Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3051504Z stepping : 7 2025-05-07T19:42:58.3051764Z microcode : 0x5003901 2025-05-07T19:42:58.3052021Z cpu MHz : 1200.716 2025-05-07T19:42:58.3052296Z cache size : 36608 KB 2025-05-07T19:42:58.3052552Z physical id : 0 2025-05-07T19:42:58.3052829Z siblings : 48 2025-05-07T19:42:58.3053056Z core id : 0 2025-05-07T19:42:58.3053312Z cpu cores : 24 2025-05-07T19:42:58.3053544Z apicid : 0 2025-05-07T19:42:58.3053802Z initial apicid : 0 2025-05-07T19:42:58.3054072Z fpu : yes 2025-05-07T19:42:58.3054308Z fpu_exception : yes 2025-05-07T19:42:58.3054626Z cpuid level : 13 2025-05-07T19:42:58.3054997Z wp : yes 2025-05-07T19:42:58.3057376Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3060189Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3060822Z bogomips : 6000.01 2025-05-07T19:42:58.3061085Z clflush size : 64 2025-05-07T19:42:58.3061307Z cache_alignment : 64 2025-05-07T19:42:58.3061604Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3062002Z power management: 2025-05-07T19:42:58.3062158Z 2025-05-07T19:42:58.3062246Z processor : 1 2025-05-07T19:42:58.3062481Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3062723Z cpu family : 6 2025-05-07T19:42:58.3062955Z model : 85 2025-05-07T19:42:58.3063240Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3063614Z stepping : 7 2025-05-07T19:42:58.3063839Z microcode : 0x5003901 2025-05-07T19:42:58.3064097Z cpu MHz : 1257.869 2025-05-07T19:42:58.3064329Z cache size : 36608 KB 2025-05-07T19:42:58.3064588Z physical id : 0 2025-05-07T19:42:58.3064830Z siblings : 48 2025-05-07T19:42:58.3065089Z core id : 1 2025-05-07T19:42:58.3065323Z cpu cores : 24 2025-05-07T19:42:58.3065583Z apicid : 2 2025-05-07T19:42:58.3065845Z initial apicid : 2 2025-05-07T19:42:58.3066081Z fpu : yes 2025-05-07T19:42:58.3066335Z fpu_exception : yes 2025-05-07T19:42:58.3066574Z cpuid level : 13 2025-05-07T19:42:58.3066831Z wp : yes 2025-05-07T19:42:58.3069162Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3071902Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3072548Z bogomips : 6000.01 2025-05-07T19:42:58.3072801Z clflush size : 64 2025-05-07T19:42:58.3073076Z cache_alignment : 64 2025-05-07T19:42:58.3073402Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3073756Z power management: 2025-05-07T19:42:58.3073902Z 2025-05-07T19:42:58.3074033Z processor : 2 2025-05-07T19:42:58.3074272Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3074560Z cpu family : 6 2025-05-07T19:42:58.3074990Z model : 85 2025-05-07T19:42:58.3075319Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3075785Z stepping : 7 2025-05-07T19:42:58.3076079Z microcode : 0x5003901 2025-05-07T19:42:58.3076331Z cpu MHz : 1199.411 2025-05-07T19:42:58.3076604Z cache size : 36608 KB 2025-05-07T19:42:58.3076878Z physical id : 0 2025-05-07T19:42:58.3077106Z siblings : 48 2025-05-07T19:42:58.3077354Z core id : 2 2025-05-07T19:42:58.3077574Z cpu cores : 24 2025-05-07T19:42:58.3077818Z apicid : 4 2025-05-07T19:42:58.3078041Z initial apicid : 4 2025-05-07T19:42:58.3078298Z fpu : yes 2025-05-07T19:42:58.3078517Z fpu_exception : yes 2025-05-07T19:42:58.3078782Z cpuid level : 13 2025-05-07T19:42:58.3079010Z wp : yes 2025-05-07T19:42:58.3081360Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3084281Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3084879Z bogomips : 6000.01 2025-05-07T19:42:58.3085146Z clflush size : 64 2025-05-07T19:42:58.3085428Z cache_alignment : 64 2025-05-07T19:42:58.3085725Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3086103Z power management: 2025-05-07T19:42:58.3086246Z 2025-05-07T19:42:58.3086346Z processor : 3 2025-05-07T19:42:58.3086705Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3086963Z cpu family : 6 2025-05-07T19:42:58.3087213Z model : 85 2025-05-07T19:42:58.3087507Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3087909Z stepping : 7 2025-05-07T19:42:58.3088139Z microcode : 0x5003901 2025-05-07T19:42:58.3088424Z cpu MHz : 1199.240 2025-05-07T19:42:58.3088684Z cache size : 36608 KB 2025-05-07T19:42:58.3088926Z physical id : 0 2025-05-07T19:42:58.3089184Z siblings : 48 2025-05-07T19:42:58.3089405Z core id : 3 2025-05-07T19:42:58.3089652Z cpu cores : 24 2025-05-07T19:42:58.3089876Z apicid : 6 2025-05-07T19:42:58.3090128Z initial apicid : 6 2025-05-07T19:42:58.3090359Z fpu : yes 2025-05-07T19:42:58.3090607Z fpu_exception : yes 2025-05-07T19:42:58.3090853Z cpuid level : 13 2025-05-07T19:42:58.3091119Z wp : yes 2025-05-07T19:42:58.3093408Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3096314Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3096962Z bogomips : 6000.01 2025-05-07T19:42:58.3097209Z clflush size : 64 2025-05-07T19:42:58.3097483Z cache_alignment : 64 2025-05-07T19:42:58.3097815Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3098165Z power management: 2025-05-07T19:42:58.3098315Z 2025-05-07T19:42:58.3098448Z processor : 4 2025-05-07T19:42:58.3098688Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3098996Z cpu family : 6 2025-05-07T19:42:58.3099232Z model : 85 2025-05-07T19:42:58.3099567Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3099949Z stepping : 7 2025-05-07T19:42:58.3100217Z microcode : 0x5003901 2025-05-07T19:42:58.3100471Z cpu MHz : 1207.834 2025-05-07T19:42:58.3100733Z cache size : 36608 KB 2025-05-07T19:42:58.3101007Z physical id : 0 2025-05-07T19:42:58.3101236Z siblings : 48 2025-05-07T19:42:58.3101478Z core id : 4 2025-05-07T19:42:58.3101692Z cpu cores : 24 2025-05-07T19:42:58.3101937Z apicid : 8 2025-05-07T19:42:58.3102152Z initial apicid : 8 2025-05-07T19:42:58.3102403Z fpu : yes 2025-05-07T19:42:58.3102624Z fpu_exception : yes 2025-05-07T19:42:58.3102890Z cpuid level : 13 2025-05-07T19:42:58.3103121Z wp : yes 2025-05-07T19:42:58.3105469Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3108375Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3108968Z bogomips : 6000.01 2025-05-07T19:42:58.3109237Z clflush size : 64 2025-05-07T19:42:58.3109507Z cache_alignment : 64 2025-05-07T19:42:58.3109800Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3110169Z power management: 2025-05-07T19:42:58.3110312Z 2025-05-07T19:42:58.3110410Z processor : 5 2025-05-07T19:42:58.3110671Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3110926Z cpu family : 6 2025-05-07T19:42:58.3111235Z model : 85 2025-05-07T19:42:58.3111591Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3111984Z stepping : 7 2025-05-07T19:42:58.3112211Z microcode : 0x5003901 2025-05-07T19:42:58.3112484Z cpu MHz : 1199.242 2025-05-07T19:42:58.3112746Z cache size : 36608 KB 2025-05-07T19:42:58.3112996Z physical id : 0 2025-05-07T19:42:58.3113266Z siblings : 48 2025-05-07T19:42:58.3113483Z core id : 5 2025-05-07T19:42:58.3113728Z cpu cores : 24 2025-05-07T19:42:58.3113951Z apicid : 10 2025-05-07T19:42:58.3114195Z initial apicid : 10 2025-05-07T19:42:58.3114431Z fpu : yes 2025-05-07T19:42:58.3114672Z fpu_exception : yes 2025-05-07T19:42:58.3114913Z cpuid level : 13 2025-05-07T19:42:58.3115162Z wp : yes 2025-05-07T19:42:58.3117433Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3120033Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3120653Z bogomips : 6000.01 2025-05-07T19:42:58.3120915Z clflush size : 64 2025-05-07T19:42:58.3121150Z cache_alignment : 64 2025-05-07T19:42:58.3121467Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3121806Z power management: 2025-05-07T19:42:58.3121947Z 2025-05-07T19:42:58.3122066Z processor : 6 2025-05-07T19:42:58.3122300Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3122580Z cpu family : 6 2025-05-07T19:42:58.3122802Z model : 85 2025-05-07T19:42:58.3123115Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3123483Z stepping : 7 2025-05-07T19:42:58.3123733Z microcode : 0x5003901 2025-05-07T19:42:58.3123976Z cpu MHz : 1199.672 2025-05-07T19:42:58.3124234Z cache size : 36608 KB 2025-05-07T19:42:58.3124501Z physical id : 0 2025-05-07T19:42:58.3124725Z siblings : 48 2025-05-07T19:42:58.3124965Z core id : 6 2025-05-07T19:42:58.3125180Z cpu cores : 24 2025-05-07T19:42:58.3125428Z apicid : 12 2025-05-07T19:42:58.3125646Z initial apicid : 12 2025-05-07T19:42:58.3125901Z fpu : yes 2025-05-07T19:42:58.3126119Z fpu_exception : yes 2025-05-07T19:42:58.3126388Z cpuid level : 13 2025-05-07T19:42:58.3126624Z wp : yes 2025-05-07T19:42:58.3128902Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3131626Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3132223Z bogomips : 6000.01 2025-05-07T19:42:58.3132490Z clflush size : 64 2025-05-07T19:42:58.3132750Z cache_alignment : 64 2025-05-07T19:42:58.3133038Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3133411Z power management: 2025-05-07T19:42:58.3133553Z 2025-05-07T19:42:58.3133652Z processor : 7 2025-05-07T19:42:58.3133913Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3134175Z cpu family : 6 2025-05-07T19:42:58.3134427Z model : 85 2025-05-07T19:42:58.3134718Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3135451Z stepping : 7 2025-05-07T19:42:58.3135693Z microcode : 0x5003901 2025-05-07T19:42:58.3135977Z cpu MHz : 3000.006 2025-05-07T19:42:58.3136331Z cache size : 36608 KB 2025-05-07T19:42:58.3136581Z physical id : 0 2025-05-07T19:42:58.3136850Z siblings : 48 2025-05-07T19:42:58.3137076Z core id : 7 2025-05-07T19:42:58.3137320Z cpu cores : 24 2025-05-07T19:42:58.3137551Z apicid : 14 2025-05-07T19:42:58.3137805Z initial apicid : 14 2025-05-07T19:42:58.3138046Z fpu : yes 2025-05-07T19:42:58.3138306Z fpu_exception : yes 2025-05-07T19:42:58.3138555Z cpuid level : 13 2025-05-07T19:42:58.3163978Z wp : yes 2025-05-07T19:42:58.3166278Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3168822Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3169433Z bogomips : 6000.01 2025-05-07T19:42:58.3169675Z clflush size : 64 2025-05-07T19:42:58.3169931Z cache_alignment : 64 2025-05-07T19:42:58.3170215Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3170575Z power management: 2025-05-07T19:42:58.3170714Z 2025-05-07T19:42:58.3170813Z processor : 8 2025-05-07T19:42:58.3171069Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3171350Z cpu family : 6 2025-05-07T19:42:58.3171569Z model : 85 2025-05-07T19:42:58.3171882Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3172236Z stepping : 7 2025-05-07T19:42:58.3172493Z microcode : 0x5003901 2025-05-07T19:42:58.3172731Z cpu MHz : 1199.768 2025-05-07T19:42:58.3172985Z cache size : 36608 KB 2025-05-07T19:42:58.3173228Z physical id : 0 2025-05-07T19:42:58.3173654Z siblings : 48 2025-05-07T19:42:58.3173881Z core id : 8 2025-05-07T19:42:58.3174135Z cpu cores : 24 2025-05-07T19:42:58.3174357Z apicid : 16 2025-05-07T19:42:58.3174613Z initial apicid : 16 2025-05-07T19:42:58.3175253Z fpu : yes 2025-05-07T19:42:58.3175478Z fpu_exception : yes 2025-05-07T19:42:58.3175951Z cpuid level : 13 2025-05-07T19:42:58.3176182Z wp : yes 2025-05-07T19:42:58.3178605Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3181302Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3182115Z bogomips : 6000.01 2025-05-07T19:42:58.3182385Z clflush size : 64 2025-05-07T19:42:58.3182624Z cache_alignment : 64 2025-05-07T19:42:58.3182950Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3183304Z power management: 2025-05-07T19:42:58.3183476Z 2025-05-07T19:42:58.3183575Z processor : 9 2025-05-07T19:42:58.3183845Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3184109Z cpu family : 6 2025-05-07T19:42:58.3184368Z model : 85 2025-05-07T19:42:58.3184707Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3185111Z stepping : 7 2025-05-07T19:42:58.3185341Z microcode : 0x5003901 2025-05-07T19:42:58.3185712Z cpu MHz : 3000.006 2025-05-07T19:42:58.3185958Z cache size : 36608 KB 2025-05-07T19:42:58.3186235Z physical id : 0 2025-05-07T19:42:58.3186473Z siblings : 48 2025-05-07T19:42:58.3186732Z core id : 9 2025-05-07T19:42:58.3186967Z cpu cores : 24 2025-05-07T19:42:58.3187223Z apicid : 18 2025-05-07T19:42:58.3187454Z initial apicid : 18 2025-05-07T19:42:58.3187720Z fpu : yes 2025-05-07T19:42:58.3187990Z fpu_exception : yes 2025-05-07T19:42:58.3188237Z cpuid level : 13 2025-05-07T19:42:58.3188493Z wp : yes 2025-05-07T19:42:58.3190805Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3193522Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3194161Z bogomips : 6000.01 2025-05-07T19:42:58.3194403Z clflush size : 64 2025-05-07T19:42:58.3194674Z cache_alignment : 64 2025-05-07T19:42:58.3194967Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3195339Z power management: 2025-05-07T19:42:58.3195484Z 2025-05-07T19:42:58.3195582Z processor : 10 2025-05-07T19:42:58.3195849Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3196136Z cpu family : 6 2025-05-07T19:42:58.3196365Z model : 85 2025-05-07T19:42:58.3196686Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3197060Z stepping : 7 2025-05-07T19:42:58.3197319Z microcode : 0x5003901 2025-05-07T19:42:58.3197567Z cpu MHz : 3000.006 2025-05-07T19:42:58.3197842Z cache size : 36608 KB 2025-05-07T19:42:58.3198091Z physical id : 0 2025-05-07T19:42:58.3198349Z siblings : 48 2025-05-07T19:42:58.3198574Z core id : 10 2025-05-07T19:42:58.3198833Z cpu cores : 24 2025-05-07T19:42:58.3199058Z apicid : 20 2025-05-07T19:42:58.3199299Z initial apicid : 20 2025-05-07T19:42:58.3199556Z fpu : yes 2025-05-07T19:42:58.3199777Z fpu_exception : yes 2025-05-07T19:42:58.3200024Z cpuid level : 13 2025-05-07T19:42:58.3200248Z wp : yes 2025-05-07T19:42:58.3202567Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3205316Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3205967Z bogomips : 6000.01 2025-05-07T19:42:58.3206215Z clflush size : 64 2025-05-07T19:42:58.3206448Z cache_alignment : 64 2025-05-07T19:42:58.3206756Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3207095Z power management: 2025-05-07T19:42:58.3207263Z 2025-05-07T19:42:58.3207360Z processor : 11 2025-05-07T19:42:58.3207629Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3207887Z cpu family : 6 2025-05-07T19:42:58.3208137Z model : 85 2025-05-07T19:42:58.3208425Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3208818Z stepping : 7 2025-05-07T19:42:58.3209038Z microcode : 0x5003901 2025-05-07T19:42:58.3209300Z cpu MHz : 3000.006 2025-05-07T19:42:58.3209530Z cache size : 36608 KB 2025-05-07T19:42:58.3209860Z physical id : 0 2025-05-07T19:42:58.3210091Z siblings : 48 2025-05-07T19:42:58.3210334Z core id : 11 2025-05-07T19:42:58.3210554Z cpu cores : 24 2025-05-07T19:42:58.3210799Z apicid : 22 2025-05-07T19:42:58.3211058Z initial apicid : 22 2025-05-07T19:42:58.3211286Z fpu : yes 2025-05-07T19:42:58.3211512Z fpu_exception : yes 2025-05-07T19:42:58.3211735Z cpuid level : 13 2025-05-07T19:42:58.3211976Z wp : yes 2025-05-07T19:42:58.3214216Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3217182Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3217826Z bogomips : 6000.01 2025-05-07T19:42:58.3218083Z clflush size : 64 2025-05-07T19:42:58.3218349Z cache_alignment : 64 2025-05-07T19:42:58.3218637Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3219010Z power management: 2025-05-07T19:42:58.3219154Z 2025-05-07T19:42:58.3219255Z processor : 12 2025-05-07T19:42:58.3219521Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3219809Z cpu family : 6 2025-05-07T19:42:58.3220034Z model : 85 2025-05-07T19:42:58.3220348Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3220722Z stepping : 7 2025-05-07T19:42:58.3220993Z microcode : 0x5003901 2025-05-07T19:42:58.3221244Z cpu MHz : 1198.611 2025-05-07T19:42:58.3221501Z cache size : 36608 KB 2025-05-07T19:42:58.3221733Z physical id : 0 2025-05-07T19:42:58.3222000Z siblings : 48 2025-05-07T19:42:58.3222223Z core id : 12 2025-05-07T19:42:58.3222458Z cpu cores : 24 2025-05-07T19:42:58.3222681Z apicid : 24 2025-05-07T19:42:58.3222924Z initial apicid : 24 2025-05-07T19:42:58.3223188Z fpu : yes 2025-05-07T19:42:58.3223403Z fpu_exception : yes 2025-05-07T19:42:58.3223658Z cpuid level : 13 2025-05-07T19:42:58.3223889Z wp : yes 2025-05-07T19:42:58.3226215Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3228995Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3229586Z bogomips : 6000.01 2025-05-07T19:42:58.3229838Z clflush size : 64 2025-05-07T19:42:58.3230148Z cache_alignment : 64 2025-05-07T19:42:58.3230442Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3230797Z power management: 2025-05-07T19:42:58.3230961Z 2025-05-07T19:42:58.3231062Z processor : 13 2025-05-07T19:42:58.3231311Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3231566Z cpu family : 6 2025-05-07T19:42:58.3231816Z model : 85 2025-05-07T19:42:58.3232107Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3232496Z stepping : 7 2025-05-07T19:42:58.3232717Z microcode : 0x5003901 2025-05-07T19:42:58.3232985Z cpu MHz : 1197.253 2025-05-07T19:42:58.3233219Z cache size : 36608 KB 2025-05-07T19:42:58.3233462Z physical id : 0 2025-05-07T19:42:58.3233678Z siblings : 48 2025-05-07T19:42:58.3233916Z core id : 13 2025-05-07T19:42:58.3234222Z cpu cores : 24 2025-05-07T19:42:58.3234470Z apicid : 26 2025-05-07T19:42:58.3234693Z initial apicid : 26 2025-05-07T19:42:58.3234947Z fpu : yes 2025-05-07T19:42:58.3235158Z fpu_exception : yes 2025-05-07T19:42:58.3235426Z cpuid level : 13 2025-05-07T19:42:58.3235648Z wp : yes 2025-05-07T19:42:58.3238100Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3240810Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3241421Z bogomips : 6000.01 2025-05-07T19:42:58.3241679Z clflush size : 64 2025-05-07T19:42:58.3241919Z cache_alignment : 64 2025-05-07T19:42:58.3242239Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3242584Z power management: 2025-05-07T19:42:58.3242753Z 2025-05-07T19:42:58.3242961Z processor : 14 2025-05-07T19:42:58.3243220Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3243476Z cpu family : 6 2025-05-07T19:42:58.3243728Z model : 85 2025-05-07T19:42:58.3244190Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3244628Z stepping : 7 2025-05-07T19:42:58.3244859Z microcode : 0x5003901 2025-05-07T19:42:58.3245135Z cpu MHz : 1198.812 2025-05-07T19:42:58.3245379Z cache size : 36608 KB 2025-05-07T19:42:58.3245646Z physical id : 0 2025-05-07T19:42:58.3245876Z siblings : 48 2025-05-07T19:42:58.3246126Z core id : 14 2025-05-07T19:42:58.3246352Z cpu cores : 24 2025-05-07T19:42:58.3246608Z apicid : 28 2025-05-07T19:42:58.3246855Z initial apicid : 28 2025-05-07T19:42:58.3247092Z fpu : yes 2025-05-07T19:42:58.3247337Z fpu_exception : yes 2025-05-07T19:42:58.3247577Z cpuid level : 13 2025-05-07T19:42:58.3247837Z wp : yes 2025-05-07T19:42:58.3250133Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3253012Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3253641Z bogomips : 6000.01 2025-05-07T19:42:58.3253879Z clflush size : 64 2025-05-07T19:42:58.3254138Z cache_alignment : 64 2025-05-07T19:42:58.3254429Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3259605Z power management: 2025-05-07T19:42:58.3259756Z 2025-05-07T19:42:58.3259886Z processor : 15 2025-05-07T19:42:58.3260125Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3260417Z cpu family : 6 2025-05-07T19:42:58.3260640Z model : 85 2025-05-07T19:42:58.3260958Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3261334Z stepping : 7 2025-05-07T19:42:58.3261583Z microcode : 0x5003901 2025-05-07T19:42:58.3261827Z cpu MHz : 3000.006 2025-05-07T19:42:58.3262087Z cache size : 36608 KB 2025-05-07T19:42:58.3262327Z physical id : 0 2025-05-07T19:42:58.3262577Z siblings : 48 2025-05-07T19:42:58.3262797Z core id : 15 2025-05-07T19:42:58.3263040Z cpu cores : 24 2025-05-07T19:42:58.3263285Z apicid : 30 2025-05-07T19:42:58.3263634Z initial apicid : 30 2025-05-07T19:42:58.3263900Z fpu : yes 2025-05-07T19:42:58.3264118Z fpu_exception : yes 2025-05-07T19:42:58.3264377Z cpuid level : 13 2025-05-07T19:42:58.3264602Z wp : yes 2025-05-07T19:42:58.3266948Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3269658Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3270264Z bogomips : 6000.01 2025-05-07T19:42:58.3270529Z clflush size : 64 2025-05-07T19:42:58.3270768Z cache_alignment : 64 2025-05-07T19:42:58.3271080Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3271429Z power management: 2025-05-07T19:42:58.3271602Z 2025-05-07T19:42:58.3271699Z processor : 16 2025-05-07T19:42:58.3272074Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3272329Z cpu family : 6 2025-05-07T19:42:58.3272574Z model : 85 2025-05-07T19:42:58.3272856Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3273441Z stepping : 7 2025-05-07T19:42:58.3273956Z microcode : 0x5003901 2025-05-07T19:42:58.3274261Z cpu MHz : 1199.727 2025-05-07T19:42:58.3274529Z cache size : 36608 KB 2025-05-07T19:42:58.3274975Z physical id : 0 2025-05-07T19:42:58.3275238Z siblings : 48 2025-05-07T19:42:58.3275481Z core id : 16 2025-05-07T19:42:58.3275768Z cpu cores : 24 2025-05-07T19:42:58.3276054Z apicid : 32 2025-05-07T19:42:58.3276313Z initial apicid : 32 2025-05-07T19:42:58.3276559Z fpu : yes 2025-05-07T19:42:58.3276800Z fpu_exception : yes 2025-05-07T19:42:58.3277044Z cpuid level : 13 2025-05-07T19:42:58.3277303Z wp : yes 2025-05-07T19:42:58.3279655Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3282330Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3282972Z bogomips : 6000.01 2025-05-07T19:42:58.3283246Z clflush size : 64 2025-05-07T19:42:58.3283480Z cache_alignment : 64 2025-05-07T19:42:58.3283796Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3284138Z power management: 2025-05-07T19:42:58.3284410Z 2025-05-07T19:42:58.3284534Z processor : 17 2025-05-07T19:42:58.3284774Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3285059Z cpu family : 6 2025-05-07T19:42:58.3285286Z model : 85 2025-05-07T19:42:58.3285607Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3285983Z stepping : 7 2025-05-07T19:42:58.3286239Z microcode : 0x5003901 2025-05-07T19:42:58.3286505Z cpu MHz : 1197.097 2025-05-07T19:42:58.3286738Z cache size : 36608 KB 2025-05-07T19:42:58.3287007Z physical id : 0 2025-05-07T19:42:58.3287231Z siblings : 48 2025-05-07T19:42:58.3287467Z core id : 17 2025-05-07T19:42:58.3287821Z cpu cores : 24 2025-05-07T19:42:58.3288047Z apicid : 34 2025-05-07T19:42:58.3288261Z initial apicid : 34 2025-05-07T19:42:58.3288498Z fpu : yes 2025-05-07T19:42:58.3288794Z fpu_exception : yes 2025-05-07T19:42:58.3289046Z cpuid level : 13 2025-05-07T19:42:58.3289257Z wp : yes 2025-05-07T19:42:58.3291420Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3293912Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3294504Z bogomips : 6000.01 2025-05-07T19:42:58.3294729Z clflush size : 64 2025-05-07T19:42:58.3295057Z cache_alignment : 64 2025-05-07T19:42:58.3295519Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3295890Z power management: 2025-05-07T19:42:58.3296035Z 2025-05-07T19:42:58.3296132Z processor : 18 2025-05-07T19:42:58.3296489Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3296752Z cpu family : 6 2025-05-07T19:42:58.3297000Z model : 85 2025-05-07T19:42:58.3297295Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3297695Z stepping : 7 2025-05-07T19:42:58.3297929Z microcode : 0x5003901 2025-05-07T19:42:58.3298201Z cpu MHz : 3000.006 2025-05-07T19:42:58.3298465Z cache size : 36608 KB 2025-05-07T19:42:58.3298710Z physical id : 0 2025-05-07T19:42:58.3298965Z siblings : 48 2025-05-07T19:42:58.3299184Z core id : 18 2025-05-07T19:42:58.3299436Z cpu cores : 24 2025-05-07T19:42:58.3299658Z apicid : 36 2025-05-07T19:42:58.3299909Z initial apicid : 36 2025-05-07T19:42:58.3300146Z fpu : yes 2025-05-07T19:42:58.3300389Z fpu_exception : yes 2025-05-07T19:42:58.3300630Z cpuid level : 13 2025-05-07T19:42:58.3300881Z wp : yes 2025-05-07T19:42:58.3303211Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3305894Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3306518Z bogomips : 6000.01 2025-05-07T19:42:58.3306774Z clflush size : 64 2025-05-07T19:42:58.3307008Z cache_alignment : 64 2025-05-07T19:42:58.3307329Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3307784Z power management: 2025-05-07T19:42:58.3307916Z 2025-05-07T19:42:58.3308027Z processor : 19 2025-05-07T19:42:58.3308248Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3308597Z cpu family : 6 2025-05-07T19:42:58.3308807Z model : 85 2025-05-07T19:42:58.3309114Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3309466Z stepping : 7 2025-05-07T19:42:58.3309706Z microcode : 0x5003901 2025-05-07T19:42:58.3309936Z cpu MHz : 3000.006 2025-05-07T19:42:58.3310172Z cache size : 36608 KB 2025-05-07T19:42:58.3310402Z physical id : 0 2025-05-07T19:42:58.3310633Z siblings : 48 2025-05-07T19:42:58.3310832Z core id : 19 2025-05-07T19:42:58.3311053Z cpu cores : 24 2025-05-07T19:42:58.3311258Z apicid : 38 2025-05-07T19:42:58.3311487Z initial apicid : 38 2025-05-07T19:42:58.3311699Z fpu : yes 2025-05-07T19:42:58.3311887Z fpu_exception : yes 2025-05-07T19:42:58.3312102Z cpuid level : 13 2025-05-07T19:42:58.3312297Z wp : yes 2025-05-07T19:42:58.3314492Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3316956Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3317494Z bogomips : 6000.01 2025-05-07T19:42:58.3317721Z clflush size : 64 2025-05-07T19:42:58.3317928Z cache_alignment : 64 2025-05-07T19:42:58.3318184Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3318490Z power management: 2025-05-07T19:42:58.3318636Z 2025-05-07T19:42:58.3318721Z processor : 20 2025-05-07T19:42:58.3318935Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3319157Z cpu family : 6 2025-05-07T19:42:58.3319363Z model : 85 2025-05-07T19:42:58.3319616Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3319964Z stepping : 7 2025-05-07T19:42:58.3320167Z microcode : 0x5003901 2025-05-07T19:42:58.3320402Z cpu MHz : 3000.006 2025-05-07T19:42:58.3320606Z cache size : 36608 KB 2025-05-07T19:42:58.3320837Z physical id : 0 2025-05-07T19:42:58.3321030Z siblings : 48 2025-05-07T19:42:58.3321235Z core id : 20 2025-05-07T19:42:58.3321417Z cpu cores : 24 2025-05-07T19:42:58.3321624Z apicid : 40 2025-05-07T19:42:58.3321828Z initial apicid : 40 2025-05-07T19:42:58.3322033Z fpu : yes 2025-05-07T19:42:58.3322240Z fpu_exception : yes 2025-05-07T19:42:58.3322444Z cpuid level : 13 2025-05-07T19:42:58.3322651Z wp : yes 2025-05-07T19:42:58.3324762Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3327242Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3327806Z bogomips : 6000.01 2025-05-07T19:42:58.3328018Z clflush size : 64 2025-05-07T19:42:58.3328246Z cache_alignment : 64 2025-05-07T19:42:58.3328506Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3328819Z power management: 2025-05-07T19:42:58.3328949Z 2025-05-07T19:42:58.3329043Z processor : 21 2025-05-07T19:42:58.3329246Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3329481Z cpu family : 6 2025-05-07T19:42:58.3329667Z model : 85 2025-05-07T19:42:58.3330001Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3330350Z stepping : 7 2025-05-07T19:42:58.3330554Z microcode : 0x5003901 2025-05-07T19:42:58.3330791Z cpu MHz : 3000.006 2025-05-07T19:42:58.3330998Z cache size : 36608 KB 2025-05-07T19:42:58.3331228Z physical id : 0 2025-05-07T19:42:58.3331427Z siblings : 48 2025-05-07T19:42:58.3331640Z core id : 21 2025-05-07T19:42:58.3332382Z lspci: Unable to load libkmod resources: error -2 2025-05-07T19:42:58.3332679Z cpu cores : 24 2025-05-07T19:42:58.3332883Z apicid : 42 2025-05-07T19:42:58.3333064Z initial apicid : 42 2025-05-07T19:42:58.3333270Z fpu : yes 2025-05-07T19:42:58.3333452Z fpu_exception : yes 2025-05-07T19:42:58.3333662Z cpuid level : 13 2025-05-07T19:42:58.3333849Z wp : yes 2025-05-07T19:42:58.3336370Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3339040Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3339619Z bogomips : 6000.01 2025-05-07T19:42:58.3339845Z clflush size : 64 2025-05-07T19:42:58.3340063Z cache_alignment : 64 2025-05-07T19:42:58.3340353Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3340701Z power management: 2025-05-07T19:42:58.3340833Z 2025-05-07T19:42:58.3340917Z processor : 22 2025-05-07T19:42:58.3341146Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3341377Z cpu family : 6 2025-05-07T19:42:58.3341598Z model : 85 2025-05-07T19:42:58.3341875Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3342240Z stepping : 7 2025-05-07T19:42:58.3342448Z microcode : 0x5003901 2025-05-07T19:42:58.3342686Z cpu MHz : 1195.697 2025-05-07T19:42:58.3342899Z cache size : 36608 KB 2025-05-07T19:42:58.3343141Z physical id : 0 2025-05-07T19:42:58.3343355Z siblings : 48 2025-05-07T19:42:58.3343573Z core id : 22 2025-05-07T19:42:58.3343782Z cpu cores : 24 2025-05-07T19:42:58.3343984Z apicid : 44 2025-05-07T19:42:58.3344193Z initial apicid : 44 2025-05-07T19:42:58.3344399Z fpu : yes 2025-05-07T19:42:58.3344602Z fpu_exception : yes 2025-05-07T19:42:58.3344812Z cpuid level : 13 2025-05-07T19:42:58.3345017Z wp : yes 2025-05-07T19:42:58.3347294Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3349882Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3350445Z bogomips : 6000.01 2025-05-07T19:42:58.3350639Z clflush size : 64 2025-05-07T19:42:58.3350855Z cache_alignment : 64 2025-05-07T19:42:58.3351105Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3351425Z power management: 2025-05-07T19:42:58.3351550Z 2025-05-07T19:42:58.3351655Z processor : 23 2025-05-07T19:42:58.3351851Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3352086Z cpu family : 6 2025-05-07T19:42:58.3352268Z model : 85 2025-05-07T19:42:58.3352539Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3352932Z stepping : 7 2025-05-07T19:42:58.3353144Z microcode : 0x5003901 2025-05-07T19:42:58.3353512Z cpu MHz : 1200.878 2025-05-07T19:42:58.3353743Z cache size : 36608 KB 2025-05-07T19:42:58.3353950Z physical id : 0 2025-05-07T19:42:58.3354164Z siblings : 48 2025-05-07T19:42:58.3354366Z core id : 23 2025-05-07T19:42:58.3354549Z cpu cores : 24 2025-05-07T19:42:58.3354751Z apicid : 46 2025-05-07T19:42:58.3354937Z initial apicid : 46 2025-05-07T19:42:58.3355144Z fpu : yes 2025-05-07T19:42:58.3355327Z fpu_exception : yes 2025-05-07T19:42:58.3355543Z cpuid level : 13 2025-05-07T19:42:58.3355731Z wp : yes 2025-05-07T19:42:58.3357921Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3360402Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3360947Z bogomips : 6000.01 2025-05-07T19:42:58.3361170Z clflush size : 64 2025-05-07T19:42:58.3361368Z cache_alignment : 64 2025-05-07T19:42:58.3361639Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3361952Z power management: 2025-05-07T19:42:58.3362079Z 2025-05-07T19:42:58.3362155Z processor : 24 2025-05-07T19:42:58.3362377Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3362593Z cpu family : 6 2025-05-07T19:42:58.3362793Z model : 85 2025-05-07T19:42:58.3363042Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3363389Z stepping : 7 2025-05-07T19:42:58.3363581Z microcode : 0x5003901 2025-05-07T19:42:58.3363811Z cpu MHz : 3166.761 2025-05-07T19:42:58.3364007Z cache size : 36608 KB 2025-05-07T19:42:58.3364222Z physical id : 1 2025-05-07T19:42:58.3364425Z siblings : 48 2025-05-07T19:42:58.3364615Z core id : 0 2025-05-07T19:42:58.3364806Z cpu cores : 24 2025-05-07T19:42:58.3364987Z apicid : 64 2025-05-07T19:42:58.3365178Z initial apicid : 64 2025-05-07T19:42:58.3365375Z fpu : yes 2025-05-07T19:42:58.3365570Z fpu_exception : yes 2025-05-07T19:42:58.3365766Z cpuid level : 13 2025-05-07T19:42:58.3365962Z wp : yes 2025-05-07T19:42:58.3368070Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3370537Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3371137Z bogomips : 6000.01 2025-05-07T19:42:58.3371337Z clflush size : 64 2025-05-07T19:42:58.3371536Z cache_alignment : 64 2025-05-07T19:42:58.3371792Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3372086Z power management: 2025-05-07T19:42:58.3372207Z 2025-05-07T19:42:58.3372303Z processor : 25 2025-05-07T19:42:58.3372504Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3372743Z cpu family : 6 2025-05-07T19:42:58.3372931Z model : 85 2025-05-07T19:42:58.3373196Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3373520Z stepping : 7 2025-05-07T19:42:58.3373786Z microcode : 0x5003901 2025-05-07T19:42:58.3373994Z cpu MHz : 3125.398 2025-05-07T19:42:58.3374212Z cache size : 36608 KB 2025-05-07T19:42:58.3374429Z physical id : 1 2025-05-07T19:42:58.3374845Z siblings : 48 2025-05-07T19:42:58.3375285Z core id : 1 2025-05-07T19:42:58.3375513Z cpu cores : 24 2025-05-07T19:42:58.3375874Z apicid : 66 2025-05-07T19:42:58.3376105Z initial apicid : 66 2025-05-07T19:42:58.3376466Z fpu : yes 2025-05-07T19:42:58.3376690Z fpu_exception : yes 2025-05-07T19:42:58.3376957Z cpuid level : 13 2025-05-07T19:42:58.3377196Z wp : yes 2025-05-07T19:42:58.3379655Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3382368Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3382976Z bogomips : 6000.01 2025-05-07T19:42:58.3383238Z clflush size : 64 2025-05-07T19:42:58.3383480Z cache_alignment : 64 2025-05-07T19:42:58.3383785Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3384156Z power management: 2025-05-07T19:42:58.3384299Z 2025-05-07T19:42:58.3384393Z processor : 26 2025-05-07T19:42:58.3384650Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3384903Z cpu family : 6 2025-05-07T19:42:58.3385145Z model : 85 2025-05-07T19:42:58.3385436Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3385827Z stepping : 7 2025-05-07T19:42:58.3386054Z microcode : 0x5003901 2025-05-07T19:42:58.3386328Z cpu MHz : 3114.788 2025-05-07T19:42:58.3386570Z cache size : 36608 KB 2025-05-07T19:42:58.3386847Z physical id : 1 2025-05-07T19:42:58.3387107Z siblings : 48 2025-05-07T19:42:58.3387332Z core id : 2 2025-05-07T19:42:58.3387578Z cpu cores : 24 2025-05-07T19:42:58.3387903Z apicid : 68 2025-05-07T19:42:58.3388139Z initial apicid : 68 2025-05-07T19:42:58.3388353Z fpu : yes 2025-05-07T19:42:58.3388583Z fpu_exception : yes 2025-05-07T19:42:58.3388808Z cpuid level : 13 2025-05-07T19:42:58.3389047Z wp : yes 2025-05-07T19:42:58.3391172Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3393673Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3394259Z bogomips : 6000.01 2025-05-07T19:42:58.3394483Z clflush size : 64 2025-05-07T19:42:58.3394731Z cache_alignment : 64 2025-05-07T19:42:58.3395030Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3395354Z power management: 2025-05-07T19:42:58.3395489Z 2025-05-07T19:42:58.3395601Z processor : 27 2025-05-07T19:42:58.3395824Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3396085Z cpu family : 6 2025-05-07T19:42:58.3396292Z model : 85 2025-05-07T19:42:58.3396588Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3396934Z stepping : 7 2025-05-07T19:42:58.3397167Z microcode : 0x5003901 2025-05-07T19:42:58.3397393Z cpu MHz : 3139.917 2025-05-07T19:42:58.3397733Z cache size : 36608 KB 2025-05-07T19:42:58.3397990Z physical id : 1 2025-05-07T19:42:58.3398208Z siblings : 48 2025-05-07T19:42:58.3398442Z core id : 3 2025-05-07T19:42:58.3398645Z cpu cores : 24 2025-05-07T19:42:58.3398881Z apicid : 70 2025-05-07T19:42:58.3399089Z initial apicid : 70 2025-05-07T19:42:58.3399333Z fpu : yes 2025-05-07T19:42:58.3399537Z fpu_exception : yes 2025-05-07T19:42:58.3399780Z cpuid level : 13 2025-05-07T19:42:58.3399988Z wp : yes 2025-05-07T19:42:58.3402196Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3404696Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3405256Z bogomips : 6000.01 2025-05-07T19:42:58.3405493Z clflush size : 64 2025-05-07T19:42:58.3405713Z cache_alignment : 64 2025-05-07T19:42:58.3406004Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3406350Z power management: 2025-05-07T19:42:58.3406484Z 2025-05-07T19:42:58.3406574Z processor : 28 2025-05-07T19:42:58.3406822Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3407060Z cpu family : 6 2025-05-07T19:42:58.3407291Z model : 85 2025-05-07T19:42:58.3407568Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3407944Z stepping : 7 2025-05-07T19:42:58.3408163Z microcode : 0x5003901 2025-05-07T19:42:58.3408424Z cpu MHz : 3000.006 2025-05-07T19:42:58.3408649Z cache size : 36608 KB 2025-05-07T19:42:58.3408910Z physical id : 1 2025-05-07T19:42:58.3409144Z siblings : 48 2025-05-07T19:42:58.3409353Z core id : 4 2025-05-07T19:42:58.3409588Z cpu cores : 24 2025-05-07T19:42:58.3409793Z apicid : 72 2025-05-07T19:42:58.3410025Z initial apicid : 72 2025-05-07T19:42:58.3410247Z fpu : yes 2025-05-07T19:42:58.3410473Z fpu_exception : yes 2025-05-07T19:42:58.3410667Z cpuid level : 13 2025-05-07T19:42:58.3410874Z wp : yes 2025-05-07T19:42:58.3412979Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3415744Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3416357Z bogomips : 6000.01 2025-05-07T19:42:58.3416578Z clflush size : 64 2025-05-07T19:42:58.3416829Z cache_alignment : 64 2025-05-07T19:42:58.3417118Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3417454Z power management: 2025-05-07T19:42:58.3417593Z 2025-05-07T19:42:58.3417709Z processor : 29 2025-05-07T19:42:58.3417939Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3418204Z cpu family : 6 2025-05-07T19:42:58.3418412Z model : 85 2025-05-07T19:42:58.3418708Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3419056Z stepping : 7 2025-05-07T19:42:58.3419280Z microcode : 0x5003901 2025-05-07T19:42:58.3419517Z cpu MHz : 3142.161 2025-05-07T19:42:58.3419763Z cache size : 36608 KB 2025-05-07T19:42:58.3420004Z physical id : 1 2025-05-07T19:42:58.3420240Z siblings : 48 2025-05-07T19:42:58.3420546Z core id : 5 2025-05-07T19:42:58.3420756Z cpu cores : 24 2025-05-07T19:42:58.3420988Z apicid : 74 2025-05-07T19:42:58.3421203Z initial apicid : 74 2025-05-07T19:42:58.3421439Z fpu : yes 2025-05-07T19:42:58.3421649Z fpu_exception : yes 2025-05-07T19:42:58.3421880Z cpuid level : 13 2025-05-07T19:42:58.3422088Z wp : yes 2025-05-07T19:42:58.3424493Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3427162Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3427855Z bogomips : 6000.01 2025-05-07T19:42:58.3428081Z clflush size : 64 2025-05-07T19:42:58.3428288Z cache_alignment : 64 2025-05-07T19:42:58.3428553Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3428870Z power management: 2025-05-07T19:42:58.3428994Z 2025-05-07T19:42:58.3429069Z processor : 30 2025-05-07T19:42:58.3429291Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3429514Z cpu family : 6 2025-05-07T19:42:58.3429719Z model : 85 2025-05-07T19:42:58.3429983Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3430327Z stepping : 7 2025-05-07T19:42:58.3430515Z microcode : 0x5003901 2025-05-07T19:42:58.3430743Z cpu MHz : 3000.006 2025-05-07T19:42:58.3430933Z cache size : 36608 KB 2025-05-07T19:42:58.3431145Z physical id : 1 2025-05-07T19:42:58.3431349Z siblings : 48 2025-05-07T19:42:58.3431535Z core id : 6 2025-05-07T19:42:58.3431734Z cpu cores : 24 2025-05-07T19:42:58.3431924Z apicid : 76 2025-05-07T19:42:58.3432134Z initial apicid : 76 2025-05-07T19:42:58.3432327Z fpu : yes 2025-05-07T19:42:58.3432530Z fpu_exception : yes 2025-05-07T19:42:58.3432729Z cpuid level : 13 2025-05-07T19:42:58.3432944Z wp : yes 2025-05-07T19:42:58.3435063Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3437543Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3438105Z bogomips : 6000.01 2025-05-07T19:42:58.3438312Z clflush size : 64 2025-05-07T19:42:58.3438524Z cache_alignment : 64 2025-05-07T19:42:58.3438789Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3439082Z power management: 2025-05-07T19:42:58.3439204Z 2025-05-07T19:42:58.3439303Z processor : 31 2025-05-07T19:42:58.3439508Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3439738Z cpu family : 6 2025-05-07T19:42:58.3439923Z model : 85 2025-05-07T19:42:58.3440200Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3440529Z stepping : 7 2025-05-07T19:42:58.3440737Z microcode : 0x5003901 2025-05-07T19:42:58.3440951Z cpu MHz : 3000.006 2025-05-07T19:42:58.3441163Z cache size : 36608 KB 2025-05-07T19:42:58.3441389Z physical id : 1 2025-05-07T19:42:58.3441583Z siblings : 48 2025-05-07T19:42:58.3441779Z core id : 7 2025-05-07T19:42:58.3441963Z cpu cores : 24 2025-05-07T19:42:58.3442237Z apicid : 78 2025-05-07T19:42:58.3442421Z initial apicid : 78 2025-05-07T19:42:58.3442630Z fpu : yes 2025-05-07T19:42:58.3442819Z fpu_exception : yes 2025-05-07T19:42:58.3443029Z cpuid level : 13 2025-05-07T19:42:58.3443229Z wp : yes 2025-05-07T19:42:58.3445420Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3447966Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3448524Z bogomips : 6000.01 2025-05-07T19:42:58.3448744Z clflush size : 64 2025-05-07T19:42:58.3448955Z cache_alignment : 64 2025-05-07T19:42:58.3449221Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3449526Z power management: 2025-05-07T19:42:58.3449648Z 2025-05-07T19:42:58.3449730Z processor : 32 2025-05-07T19:42:58.3449943Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3450159Z cpu family : 6 2025-05-07T19:42:58.3450361Z model : 85 2025-05-07T19:42:58.3450612Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3450945Z stepping : 7 2025-05-07T19:42:58.3451141Z microcode : 0x5003901 2025-05-07T19:42:58.3451349Z cpu MHz : 3145.991 2025-05-07T19:42:58.3451554Z cache size : 36608 KB 2025-05-07T19:42:58.3451928Z physical id : 1 2025-05-07T19:42:58.3452123Z siblings : 48 2025-05-07T19:42:58.3452345Z core id : 8 2025-05-07T19:42:58.3452537Z cpu cores : 24 2025-05-07T19:42:58.3452723Z apicid : 80 2025-05-07T19:42:58.3452929Z initial apicid : 80 2025-05-07T19:42:58.3453135Z fpu : yes 2025-05-07T19:42:58.3453337Z fpu_exception : yes 2025-05-07T19:42:58.3453542Z cpuid level : 13 2025-05-07T19:42:58.3453757Z wp : yes 2025-05-07T19:42:58.3456197Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3458940Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3459558Z bogomips : 6000.01 2025-05-07T19:42:58.3459777Z clflush size : 64 2025-05-07T19:42:58.3460015Z cache_alignment : 64 2025-05-07T19:42:58.3460295Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3460615Z power management: 2025-05-07T19:42:58.3460750Z 2025-05-07T19:42:58.3460845Z processor : 33 2025-05-07T19:42:58.3461051Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3461297Z cpu family : 6 2025-05-07T19:42:58.3461488Z model : 85 2025-05-07T19:42:58.3461765Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3462112Z stepping : 7 2025-05-07T19:42:58.3462325Z microcode : 0x5003901 2025-05-07T19:42:58.3462549Z cpu MHz : 3000.006 2025-05-07T19:42:58.3462782Z cache size : 36608 KB 2025-05-07T19:42:58.3463022Z physical id : 1 2025-05-07T19:42:58.3463225Z siblings : 48 2025-05-07T19:42:58.3463426Z core id : 9 2025-05-07T19:42:58.3463616Z cpu cores : 24 2025-05-07T19:42:58.3463826Z apicid : 82 2025-05-07T19:42:58.3464019Z initial apicid : 82 2025-05-07T19:42:58.3464302Z fpu : yes 2025-05-07T19:42:58.3464488Z fpu_exception : yes 2025-05-07T19:42:58.3464705Z cpuid level : 13 2025-05-07T19:42:58.3464898Z wp : yes 2025-05-07T19:42:58.3467197Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3469829Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3470375Z bogomips : 6000.01 2025-05-07T19:42:58.3470575Z clflush size : 64 2025-05-07T19:42:58.3470774Z cache_alignment : 64 2025-05-07T19:42:58.3471024Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3471327Z power management: 2025-05-07T19:42:58.3471446Z 2025-05-07T19:42:58.3471529Z processor : 34 2025-05-07T19:42:58.3471747Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3471966Z cpu family : 6 2025-05-07T19:42:58.3472161Z model : 85 2025-05-07T19:42:58.3472403Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3472744Z stepping : 7 2025-05-07T19:42:58.3472939Z microcode : 0x5003901 2025-05-07T19:42:58.3473174Z cpu MHz : 3000.006 2025-05-07T19:42:58.3473375Z cache size : 36608 KB 2025-05-07T19:42:58.3473604Z physical id : 1 2025-05-07T19:42:58.3473816Z siblings : 48 2025-05-07T19:42:58.3474002Z core id : 10 2025-05-07T19:42:58.3474192Z cpu cores : 24 2025-05-07T19:42:58.3474379Z apicid : 84 2025-05-07T19:42:58.3474574Z initial apicid : 84 2025-05-07T19:42:58.3474956Z fpu : yes 2025-05-07T19:42:58.3475325Z fpu_exception : yes 2025-05-07T19:42:58.3475409Z cpuid level : 13 2025-05-07T19:42:58.3475490Z wp : yes 2025-05-07T19:42:58.3477689Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3478082Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3478171Z bogomips : 6000.01 2025-05-07T19:42:58.3478273Z clflush size : 64 2025-05-07T19:42:58.3478355Z cache_alignment : 64 2025-05-07T19:42:58.3478489Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3478583Z power management: 2025-05-07T19:42:58.3478588Z 2025-05-07T19:42:58.3478681Z processor : 35 2025-05-07T19:42:58.3478778Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3478858Z cpu family : 6 2025-05-07T19:42:58.3478947Z model : 85 2025-05-07T19:42:58.3479113Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3479199Z stepping : 7 2025-05-07T19:42:58.3479300Z microcode : 0x5003901 2025-05-07T19:42:58.3479383Z cpu MHz : 3000.006 2025-05-07T19:42:58.3479467Z cache size : 36608 KB 2025-05-07T19:42:58.3479554Z physical id : 1 2025-05-07T19:42:58.3479659Z siblings : 48 2025-05-07T19:42:58.3479738Z core id : 11 2025-05-07T19:42:58.3479827Z cpu cores : 24 2025-05-07T19:42:58.3479936Z apicid : 86 2025-05-07T19:42:58.3480031Z initial apicid : 86 2025-05-07T19:42:58.3480114Z fpu : yes 2025-05-07T19:42:58.3480204Z fpu_exception : yes 2025-05-07T19:42:58.3480302Z cpuid level : 13 2025-05-07T19:42:58.3480480Z wp : yes 2025-05-07T19:42:58.3482672Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3483150Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3483242Z bogomips : 6000.01 2025-05-07T19:42:58.3483327Z clflush size : 64 2025-05-07T19:42:58.3483434Z cache_alignment : 64 2025-05-07T19:42:58.3483571Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3483664Z power management: 2025-05-07T19:42:58.3483669Z 2025-05-07T19:42:58.3483770Z processor : 36 2025-05-07T19:42:58.3483866Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3483951Z cpu family : 6 2025-05-07T19:42:58.3484030Z model : 85 2025-05-07T19:42:58.3484219Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3484310Z stepping : 7 2025-05-07T19:42:58.3484402Z microcode : 0x5003901 2025-05-07T19:42:58.3484504Z cpu MHz : 3000.006 2025-05-07T19:42:58.3484591Z cache size : 36608 KB 2025-05-07T19:42:58.3484680Z physical id : 1 2025-05-07T19:42:58.3484767Z siblings : 48 2025-05-07T19:42:58.3484856Z core id : 12 2025-05-07T19:42:58.3484950Z cpu cores : 24 2025-05-07T19:42:58.3485037Z apicid : 88 2025-05-07T19:42:58.3485146Z initial apicid : 88 2025-05-07T19:42:58.3485223Z fpu : yes 2025-05-07T19:42:58.3485313Z fpu_exception : yes 2025-05-07T19:42:58.3485400Z cpuid level : 13 2025-05-07T19:42:58.3485495Z wp : yes 2025-05-07T19:42:58.3487688Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3488102Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3488189Z bogomips : 6000.01 2025-05-07T19:42:58.3488281Z clflush size : 64 2025-05-07T19:42:58.3488374Z cache_alignment : 64 2025-05-07T19:42:58.3488513Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3488601Z power management: 2025-05-07T19:42:58.3488606Z 2025-05-07T19:42:58.3488692Z processor : 37 2025-05-07T19:42:58.3488794Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3488985Z cpu family : 6 2025-05-07T19:42:58.3489058Z model : 85 2025-05-07T19:42:58.3489206Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3489295Z stepping : 7 2025-05-07T19:42:58.3489371Z microcode : 0x5003901 2025-05-07T19:42:58.3489452Z cpu MHz : 3000.006 2025-05-07T19:42:58.3489543Z cache size : 36608 KB 2025-05-07T19:42:58.3489621Z physical id : 1 2025-05-07T19:42:58.3489701Z siblings : 48 2025-05-07T19:42:58.3489781Z core id : 13 2025-05-07T19:42:58.3489872Z cpu cores : 24 2025-05-07T19:42:58.3489947Z apicid : 90 2025-05-07T19:42:58.3490030Z initial apicid : 90 2025-05-07T19:42:58.3490119Z fpu : yes 2025-05-07T19:42:58.3490195Z fpu_exception : yes 2025-05-07T19:42:58.3490271Z cpuid level : 13 2025-05-07T19:42:58.3490347Z wp : yes 2025-05-07T19:42:58.3492488Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3493125Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3493249Z bogomips : 6000.01 2025-05-07T19:42:58.3493391Z clflush size : 64 2025-05-07T19:42:58.3493480Z cache_alignment : 64 2025-05-07T19:42:58.3493609Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3493717Z power management: 2025-05-07T19:42:58.3493725Z 2025-05-07T19:42:58.3493810Z processor : 38 2025-05-07T19:42:58.3493908Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3494026Z cpu family : 6 2025-05-07T19:42:58.3494116Z model : 85 2025-05-07T19:42:58.3494284Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3494374Z stepping : 7 2025-05-07T19:42:58.3494495Z microcode : 0x5003901 2025-05-07T19:42:58.3494588Z cpu MHz : 3000.006 2025-05-07T19:42:58.3494686Z cache size : 36608 KB 2025-05-07T19:42:58.3494942Z physical id : 1 2025-05-07T19:42:58.3495041Z siblings : 48 2025-05-07T19:42:58.3495162Z core id : 14 2025-05-07T19:42:58.3495491Z cpu cores : 24 2025-05-07T19:42:58.3495628Z apicid : 92 2025-05-07T19:42:58.3495822Z initial apicid : 92 2025-05-07T19:42:58.3495964Z fpu : yes 2025-05-07T19:42:58.3496141Z fpu_exception : yes 2025-05-07T19:42:58.3496241Z cpuid level : 13 2025-05-07T19:42:58.3496362Z wp : yes 2025-05-07T19:42:58.3498595Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3499020Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3499149Z bogomips : 6000.01 2025-05-07T19:42:58.3499249Z clflush size : 64 2025-05-07T19:42:58.3499355Z cache_alignment : 64 2025-05-07T19:42:58.3499529Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3499633Z power management: 2025-05-07T19:42:58.3499637Z 2025-05-07T19:42:58.3499737Z processor : 39 2025-05-07T19:42:58.3499872Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3499968Z cpu family : 6 2025-05-07T19:42:58.3500068Z model : 85 2025-05-07T19:42:58.3500245Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3500372Z stepping : 7 2025-05-07T19:42:58.3500471Z microcode : 0x5003901 2025-05-07T19:42:58.3500568Z cpu MHz : 3192.139 2025-05-07T19:42:58.3500668Z cache size : 36608 KB 2025-05-07T19:42:58.3500795Z physical id : 1 2025-05-07T19:42:58.3500892Z siblings : 48 2025-05-07T19:42:58.3500986Z core id : 15 2025-05-07T19:42:58.3501109Z cpu cores : 24 2025-05-07T19:42:58.3501207Z apicid : 94 2025-05-07T19:42:58.3501305Z initial apicid : 94 2025-05-07T19:42:58.3501400Z fpu : yes 2025-05-07T19:42:58.3501524Z fpu_exception : yes 2025-05-07T19:42:58.3501620Z cpuid level : 13 2025-05-07T19:42:58.3501708Z wp : yes 2025-05-07T19:42:58.3503934Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3504403Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3504498Z bogomips : 6000.01 2025-05-07T19:42:58.3504619Z clflush size : 64 2025-05-07T19:42:58.3504716Z cache_alignment : 64 2025-05-07T19:42:58.3506117Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3506251Z power management: 2025-05-07T19:42:58.3506256Z 2025-05-07T19:42:58.3506350Z processor : 40 2025-05-07T19:42:58.3506451Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3506547Z cpu family : 6 2025-05-07T19:42:58.3506664Z model : 85 2025-05-07T19:42:58.3506839Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3506934Z stepping : 7 2025-05-07T19:42:58.3507058Z microcode : 0x5003901 2025-05-07T19:42:58.3507152Z cpu MHz : 3261.507 2025-05-07T19:42:58.3507250Z cache size : 36608 KB 2025-05-07T19:42:58.3507344Z physical id : 1 2025-05-07T19:42:58.3507462Z siblings : 48 2025-05-07T19:42:58.3507665Z core id : 16 2025-05-07T19:42:58.3507753Z cpu cores : 24 2025-05-07T19:42:58.3507859Z apicid : 96 2025-05-07T19:42:58.3507953Z initial apicid : 96 2025-05-07T19:42:58.3508039Z fpu : yes 2025-05-07T19:42:58.3508131Z fpu_exception : yes 2025-05-07T19:42:58.3508246Z cpuid level : 13 2025-05-07T19:42:58.3508339Z wp : yes 2025-05-07T19:42:58.3510377Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3510778Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3510872Z bogomips : 6000.01 2025-05-07T19:42:58.3510968Z clflush size : 64 2025-05-07T19:42:58.3511082Z cache_alignment : 64 2025-05-07T19:42:58.3511219Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3511313Z power management: 2025-05-07T19:42:58.3511317Z 2025-05-07T19:42:58.3511433Z processor : 41 2025-05-07T19:42:58.3511528Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3511615Z cpu family : 6 2025-05-07T19:42:58.3511706Z model : 85 2025-05-07T19:42:58.3511887Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3511975Z stepping : 7 2025-05-07T19:42:58.3512068Z microcode : 0x5003901 2025-05-07T19:42:58.3512182Z cpu MHz : 3000.006 2025-05-07T19:42:58.3512270Z cache size : 36608 KB 2025-05-07T19:42:58.3512360Z physical id : 1 2025-05-07T19:42:58.3512444Z siblings : 48 2025-05-07T19:42:58.3512556Z core id : 17 2025-05-07T19:42:58.3512644Z cpu cores : 24 2025-05-07T19:42:58.3512729Z apicid : 98 2025-05-07T19:42:58.3512845Z initial apicid : 98 2025-05-07T19:42:58.3512928Z fpu : yes 2025-05-07T19:42:58.3513023Z fpu_exception : yes 2025-05-07T19:42:58.3513111Z cpuid level : 13 2025-05-07T19:42:58.3513217Z wp : yes 2025-05-07T19:42:58.3515229Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3515682Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3515772Z bogomips : 6000.01 2025-05-07T19:42:58.3515864Z clflush size : 64 2025-05-07T19:42:58.3515956Z cache_alignment : 64 2025-05-07T19:42:58.3516114Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3516210Z power management: 2025-05-07T19:42:58.3516262Z 2025-05-07T19:42:58.3516354Z processor : 42 2025-05-07T19:42:58.3516480Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3516571Z cpu family : 6 2025-05-07T19:42:58.3516660Z model : 85 2025-05-07T19:42:58.3516821Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3516932Z stepping : 7 2025-05-07T19:42:58.3517023Z microcode : 0x5003901 2025-05-07T19:42:58.3517108Z cpu MHz : 3000.006 2025-05-07T19:42:58.3517224Z cache size : 36608 KB 2025-05-07T19:42:58.3517311Z physical id : 1 2025-05-07T19:42:58.3517398Z siblings : 48 2025-05-07T19:42:58.3517486Z core id : 18 2025-05-07T19:42:58.3517596Z cpu cores : 24 2025-05-07T19:42:58.3517684Z apicid : 100 2025-05-07T19:42:58.3517774Z initial apicid : 100 2025-05-07T19:42:58.3517880Z fpu : yes 2025-05-07T19:42:58.3517970Z fpu_exception : yes 2025-05-07T19:42:58.3518059Z cpuid level : 13 2025-05-07T19:42:58.3518142Z wp : yes 2025-05-07T19:42:58.3520184Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3520556Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3520670Z bogomips : 6000.01 2025-05-07T19:42:58.3520759Z clflush size : 64 2025-05-07T19:42:58.3520849Z cache_alignment : 64 2025-05-07T19:42:58.3520981Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3521096Z power management: 2025-05-07T19:42:58.3521100Z 2025-05-07T19:42:58.3521190Z processor : 43 2025-05-07T19:42:58.3521284Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3521396Z cpu family : 6 2025-05-07T19:42:58.3521481Z model : 85 2025-05-07T19:42:58.3521638Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3521729Z stepping : 7 2025-05-07T19:42:58.3521843Z microcode : 0x5003901 2025-05-07T19:42:58.3521929Z cpu MHz : 3000.006 2025-05-07T19:42:58.3522019Z cache size : 36608 KB 2025-05-07T19:42:58.3522130Z physical id : 1 2025-05-07T19:42:58.3522214Z siblings : 48 2025-05-07T19:42:58.3522296Z core id : 19 2025-05-07T19:42:58.3522383Z cpu cores : 24 2025-05-07T19:42:58.3522491Z apicid : 102 2025-05-07T19:42:58.3522581Z initial apicid : 102 2025-05-07T19:42:58.3522664Z fpu : yes 2025-05-07T19:42:58.3522778Z fpu_exception : yes 2025-05-07T19:42:58.3522866Z cpuid level : 13 2025-05-07T19:42:58.3522950Z wp : yes 2025-05-07T19:42:58.3524980Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3525420Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3525511Z bogomips : 6000.01 2025-05-07T19:42:58.3525626Z clflush size : 64 2025-05-07T19:42:58.3525717Z cache_alignment : 64 2025-05-07T19:42:58.3525848Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3525939Z power management: 2025-05-07T19:42:58.3525943Z 2025-05-07T19:42:58.3526055Z processor : 44 2025-05-07T19:42:58.3526199Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3526289Z cpu family : 6 2025-05-07T19:42:58.3526402Z model : 85 2025-05-07T19:42:58.3526561Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3526649Z stepping : 7 2025-05-07T19:42:58.3526745Z microcode : 0x5003901 2025-05-07T19:42:58.3526855Z cpu MHz : 3000.006 2025-05-07T19:42:58.3526946Z cache size : 36608 KB 2025-05-07T19:42:58.3527035Z physical id : 1 2025-05-07T19:42:58.3527152Z siblings : 48 2025-05-07T19:42:58.3527243Z core id : 20 2025-05-07T19:42:58.3527333Z cpu cores : 24 2025-05-07T19:42:58.3527420Z apicid : 104 2025-05-07T19:42:58.3527538Z initial apicid : 104 2025-05-07T19:42:58.3527624Z fpu : yes 2025-05-07T19:42:58.3527717Z fpu_exception : yes 2025-05-07T19:42:58.3527807Z cpuid level : 13 2025-05-07T19:42:58.3527914Z wp : yes 2025-05-07T19:42:58.3529942Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3530341Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3530431Z bogomips : 6000.01 2025-05-07T19:42:58.3530520Z clflush size : 64 2025-05-07T19:42:58.3530614Z cache_alignment : 64 2025-05-07T19:42:58.3530774Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3530869Z power management: 2025-05-07T19:42:58.3530873Z 2025-05-07T19:42:58.3530959Z processor : 45 2025-05-07T19:42:58.3531081Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3531171Z cpu family : 6 2025-05-07T19:42:58.3531258Z model : 85 2025-05-07T19:42:58.3531443Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3531531Z stepping : 7 2025-05-07T19:42:58.3531625Z microcode : 0x5003901 2025-05-07T19:42:58.3531718Z cpu MHz : 3000.006 2025-05-07T19:42:58.3531834Z cache size : 36608 KB 2025-05-07T19:42:58.3531925Z physical id : 1 2025-05-07T19:42:58.3532015Z siblings : 48 2025-05-07T19:42:58.3532102Z core id : 21 2025-05-07T19:42:58.3532217Z cpu cores : 24 2025-05-07T19:42:58.3532308Z apicid : 106 2025-05-07T19:42:58.3532404Z initial apicid : 106 2025-05-07T19:42:58.3532522Z fpu : yes 2025-05-07T19:42:58.3532615Z fpu_exception : yes 2025-05-07T19:42:58.3532705Z cpuid level : 13 2025-05-07T19:42:58.3532793Z wp : yes 2025-05-07T19:42:58.3534920Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3535531Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3535659Z bogomips : 6000.01 2025-05-07T19:42:58.3535763Z clflush size : 64 2025-05-07T19:42:58.3535864Z cache_alignment : 64 2025-05-07T19:42:58.3536009Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3536138Z power management: 2025-05-07T19:42:58.3536142Z 2025-05-07T19:42:58.3536292Z processor : 46 2025-05-07T19:42:58.3536400Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3536527Z cpu family : 6 2025-05-07T19:42:58.3536624Z model : 85 2025-05-07T19:42:58.3536868Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3536991Z stepping : 7 2025-05-07T19:42:58.3537092Z microcode : 0x5003901 2025-05-07T19:42:58.3537193Z cpu MHz : 3000.006 2025-05-07T19:42:58.3537298Z cache size : 36608 KB 2025-05-07T19:42:58.3537431Z physical id : 1 2025-05-07T19:42:58.3537526Z siblings : 48 2025-05-07T19:42:58.3537623Z core id : 22 2025-05-07T19:42:58.3537722Z cpu cores : 24 2025-05-07T19:42:58.3537850Z apicid : 108 2025-05-07T19:42:58.3537955Z initial apicid : 108 2025-05-07T19:42:58.3538050Z fpu : yes 2025-05-07T19:42:58.3538181Z fpu_exception : yes 2025-05-07T19:42:58.3538279Z cpuid level : 13 2025-05-07T19:42:58.3538374Z wp : yes 2025-05-07T19:42:58.3540590Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3540997Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3541099Z bogomips : 6000.01 2025-05-07T19:42:58.3541225Z clflush size : 64 2025-05-07T19:42:58.3541330Z cache_alignment : 64 2025-05-07T19:42:58.3541474Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3541576Z power management: 2025-05-07T19:42:58.3541608Z 2025-05-07T19:42:58.3541707Z processor : 47 2025-05-07T19:42:58.3541815Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3541918Z cpu family : 6 2025-05-07T19:42:58.3542045Z model : 85 2025-05-07T19:42:58.3542222Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3542314Z stepping : 7 2025-05-07T19:42:58.3542409Z microcode : 0x5003901 2025-05-07T19:42:58.3542522Z cpu MHz : 3000.006 2025-05-07T19:42:58.3542618Z cache size : 36608 KB 2025-05-07T19:42:58.3542716Z physical id : 1 2025-05-07T19:42:58.3542840Z siblings : 48 2025-05-07T19:42:58.3542930Z core id : 23 2025-05-07T19:42:58.3543022Z cpu cores : 24 2025-05-07T19:42:58.3543113Z apicid : 110 2025-05-07T19:42:58.3543231Z initial apicid : 110 2025-05-07T19:42:58.3543320Z fpu : yes 2025-05-07T19:42:58.3543418Z fpu_exception : yes 2025-05-07T19:42:58.3543535Z cpuid level : 13 2025-05-07T19:42:58.3543623Z wp : yes 2025-05-07T19:42:58.3545814Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3546287Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3546385Z bogomips : 6000.01 2025-05-07T19:42:58.3546480Z clflush size : 64 2025-05-07T19:42:58.3546609Z cache_alignment : 64 2025-05-07T19:42:58.3546751Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3546849Z power management: 2025-05-07T19:42:58.3546854Z 2025-05-07T19:42:58.3546950Z processor : 48 2025-05-07T19:42:58.3547079Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3547174Z cpu family : 6 2025-05-07T19:42:58.3547268Z model : 85 2025-05-07T19:42:58.3547473Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3547722Z stepping : 7 2025-05-07T19:42:58.3547818Z microcode : 0x5003901 2025-05-07T19:42:58.3547912Z cpu MHz : 1195.679 2025-05-07T19:42:58.3564569Z cache size : 36608 KB 2025-05-07T19:42:58.3564743Z physical id : 0 2025-05-07T19:42:58.3564844Z siblings : 48 2025-05-07T19:42:58.3564937Z core id : 0 2025-05-07T19:42:58.3565014Z cpu cores : 24 2025-05-07T19:42:58.3565091Z apicid : 1 2025-05-07T19:42:58.3565187Z initial apicid : 1 2025-05-07T19:42:58.3565271Z fpu : yes 2025-05-07T19:42:58.3565357Z fpu_exception : yes 2025-05-07T19:42:58.3565436Z cpuid level : 13 2025-05-07T19:42:58.3565526Z wp : yes 2025-05-07T19:42:58.3567578Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3567954Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3568040Z bogomips : 6000.01 2025-05-07T19:42:58.3568122Z clflush size : 64 2025-05-07T19:42:58.3568203Z cache_alignment : 64 2025-05-07T19:42:58.3568343Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3568431Z power management: 2025-05-07T19:42:58.3568436Z 2025-05-07T19:42:58.3568517Z processor : 49 2025-05-07T19:42:58.3568620Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3568703Z cpu family : 6 2025-05-07T19:42:58.3568781Z model : 85 2025-05-07T19:42:58.3568936Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3569027Z stepping : 7 2025-05-07T19:42:58.3569117Z microcode : 0x5003901 2025-05-07T19:42:58.3569200Z cpu MHz : 1205.252 2025-05-07T19:42:58.3569292Z cache size : 36608 KB 2025-05-07T19:42:58.3569369Z physical id : 0 2025-05-07T19:42:58.3569446Z siblings : 48 2025-05-07T19:42:58.3569530Z core id : 1 2025-05-07T19:42:58.3569618Z cpu cores : 24 2025-05-07T19:42:58.3569696Z apicid : 3 2025-05-07T19:42:58.3569781Z initial apicid : 3 2025-05-07T19:42:58.3569872Z fpu : yes 2025-05-07T19:42:58.3569954Z fpu_exception : yes 2025-05-07T19:42:58.3570030Z cpuid level : 13 2025-05-07T19:42:58.3570109Z wp : yes 2025-05-07T19:42:58.3572145Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3572509Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3572739Z bogomips : 6000.01 2025-05-07T19:42:58.3572819Z clflush size : 64 2025-05-07T19:42:58.3572900Z cache_alignment : 64 2025-05-07T19:42:58.3573031Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3573129Z power management: 2025-05-07T19:42:58.3573135Z 2025-05-07T19:42:58.3573216Z processor : 50 2025-05-07T19:42:58.3573303Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3573405Z cpu family : 6 2025-05-07T19:42:58.3573480Z model : 85 2025-05-07T19:42:58.3573632Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3573712Z stepping : 7 2025-05-07T19:42:58.3573811Z microcode : 0x5003901 2025-05-07T19:42:58.3573944Z cpu MHz : 3000.006 2025-05-07T19:42:58.3574031Z cache size : 36608 KB 2025-05-07T19:42:58.3574129Z physical id : 0 2025-05-07T19:42:58.3574213Z siblings : 48 2025-05-07T19:42:58.3574291Z core id : 2 2025-05-07T19:42:58.3574378Z cpu cores : 24 2025-05-07T19:42:58.3574476Z apicid : 5 2025-05-07T19:42:58.3574562Z initial apicid : 5 2025-05-07T19:42:58.3574797Z fpu : yes 2025-05-07T19:42:58.3574971Z fpu_exception : yes 2025-05-07T19:42:58.3575221Z cpuid level : 13 2025-05-07T19:42:58.3575304Z wp : yes 2025-05-07T19:42:58.3577516Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3577913Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3578006Z bogomips : 6000.01 2025-05-07T19:42:58.3578115Z clflush size : 64 2025-05-07T19:42:58.3578207Z cache_alignment : 64 2025-05-07T19:42:58.3578344Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3578431Z power management: 2025-05-07T19:42:58.3578436Z 2025-05-07T19:42:58.3578540Z processor : 51 2025-05-07T19:42:58.3578628Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3578710Z cpu family : 6 2025-05-07T19:42:58.3578806Z model : 85 2025-05-07T19:42:58.3578966Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3579050Z stepping : 7 2025-05-07T19:42:58.3579139Z microcode : 0x5003901 2025-05-07T19:42:58.3579242Z cpu MHz : 3000.006 2025-05-07T19:42:58.3579330Z cache size : 36608 KB 2025-05-07T19:42:58.3579418Z physical id : 0 2025-05-07T19:42:58.3579517Z siblings : 48 2025-05-07T19:42:58.3579602Z core id : 3 2025-05-07T19:42:58.3579682Z cpu cores : 24 2025-05-07T19:42:58.3579761Z apicid : 7 2025-05-07T19:42:58.3579871Z initial apicid : 7 2025-05-07T19:42:58.3579952Z fpu : yes 2025-05-07T19:42:58.3580042Z fpu_exception : yes 2025-05-07T19:42:58.3580129Z cpuid level : 13 2025-05-07T19:42:58.3580229Z wp : yes 2025-05-07T19:42:58.3582425Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3582824Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3583015Z bogomips : 6000.01 2025-05-07T19:42:58.3583104Z clflush size : 64 2025-05-07T19:42:58.3583208Z cache_alignment : 64 2025-05-07T19:42:58.3583343Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3583434Z power management: 2025-05-07T19:42:58.3583439Z 2025-05-07T19:42:58.3583523Z processor : 52 2025-05-07T19:42:58.3583628Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3583713Z cpu family : 6 2025-05-07T19:42:58.3583798Z model : 85 2025-05-07T19:42:58.3583979Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3584064Z stepping : 7 2025-05-07T19:42:58.3584149Z microcode : 0x5003901 2025-05-07T19:42:58.3584232Z cpu MHz : 1289.935 2025-05-07T19:42:58.3584332Z cache size : 36608 KB 2025-05-07T19:42:58.3584487Z physical id : 0 2025-05-07T19:42:58.3584571Z siblings : 48 2025-05-07T19:42:58.3584671Z core id : 4 2025-05-07T19:42:58.3584752Z cpu cores : 24 2025-05-07T19:42:58.3584831Z apicid : 9 2025-05-07T19:42:58.3584914Z initial apicid : 9 2025-05-07T19:42:58.3585018Z fpu : yes 2025-05-07T19:42:58.3585109Z fpu_exception : yes 2025-05-07T19:42:58.3585194Z cpuid level : 13 2025-05-07T19:42:58.3585275Z wp : yes 2025-05-07T19:42:58.3587482Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3587984Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3588075Z bogomips : 6000.01 2025-05-07T19:42:58.3588155Z clflush size : 64 2025-05-07T19:42:58.3588239Z cache_alignment : 64 2025-05-07T19:42:58.3588362Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3588452Z power management: 2025-05-07T19:42:58.3588456Z 2025-05-07T19:42:58.3588537Z processor : 53 2025-05-07T19:42:58.3588622Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3588707Z cpu family : 6 2025-05-07T19:42:58.3588782Z model : 85 2025-05-07T19:42:58.3588932Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3589031Z stepping : 7 2025-05-07T19:42:58.3589112Z microcode : 0x5003901 2025-05-07T19:42:58.3589189Z cpu MHz : 3000.006 2025-05-07T19:42:58.3589267Z cache size : 36608 KB 2025-05-07T19:42:58.3589368Z physical id : 0 2025-05-07T19:42:58.3589443Z siblings : 48 2025-05-07T19:42:58.3589519Z core id : 5 2025-05-07T19:42:58.3589598Z cpu cores : 24 2025-05-07T19:42:58.3589687Z apicid : 11 2025-05-07T19:42:58.3589763Z initial apicid : 11 2025-05-07T19:42:58.3589834Z fpu : yes 2025-05-07T19:42:58.3589932Z fpu_exception : yes 2025-05-07T19:42:58.3590009Z cpuid level : 13 2025-05-07T19:42:58.3590082Z wp : yes 2025-05-07T19:42:58.3592118Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3592488Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3592568Z bogomips : 6000.01 2025-05-07T19:42:58.3592653Z clflush size : 64 2025-05-07T19:42:58.3592783Z cache_alignment : 64 2025-05-07T19:42:58.3592903Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3592983Z power management: 2025-05-07T19:42:58.3592996Z 2025-05-07T19:42:58.3593076Z processor : 54 2025-05-07T19:42:58.3593162Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3593238Z cpu family : 6 2025-05-07T19:42:58.3593325Z model : 85 2025-05-07T19:42:58.3593469Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3593544Z stepping : 7 2025-05-07T19:42:58.3593624Z microcode : 0x5003901 2025-05-07T19:42:58.3593709Z cpu MHz : 3000.006 2025-05-07T19:42:58.3593791Z cache size : 36608 KB 2025-05-07T19:42:58.3593871Z physical id : 0 2025-05-07T19:42:58.3593973Z siblings : 48 2025-05-07T19:42:58.3594052Z core id : 6 2025-05-07T19:42:58.3594197Z cpu cores : 24 2025-05-07T19:42:58.3594271Z apicid : 13 2025-05-07T19:42:58.3594374Z initial apicid : 13 2025-05-07T19:42:58.3594452Z fpu : yes 2025-05-07T19:42:58.3594533Z fpu_exception : yes 2025-05-07T19:42:58.3594629Z cpuid level : 13 2025-05-07T19:42:58.3594701Z wp : yes 2025-05-07T19:42:58.3596709Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3597082Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3597162Z bogomips : 6000.01 2025-05-07T19:42:58.3597241Z clflush size : 64 2025-05-07T19:42:58.3597332Z cache_alignment : 64 2025-05-07T19:42:58.3597457Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3597538Z power management: 2025-05-07T19:42:58.3597542Z 2025-05-07T19:42:58.3597618Z processor : 55 2025-05-07T19:42:58.3597726Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3597801Z cpu family : 6 2025-05-07T19:42:58.3597882Z model : 85 2025-05-07T19:42:58.3598046Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3598123Z stepping : 7 2025-05-07T19:42:58.3598205Z microcode : 0x5003901 2025-05-07T19:42:58.3598281Z cpu MHz : 1199.343 2025-05-07T19:42:58.3598366Z cache size : 36608 KB 2025-05-07T19:42:58.3598441Z physical id : 0 2025-05-07T19:42:58.3598518Z siblings : 48 2025-05-07T19:42:58.3598606Z core id : 7 2025-05-07T19:42:58.3598682Z cpu cores : 24 2025-05-07T19:42:58.3598756Z apicid : 15 2025-05-07T19:42:58.3598835Z initial apicid : 15 2025-05-07T19:42:58.3598919Z fpu : yes 2025-05-07T19:42:58.3598998Z fpu_exception : yes 2025-05-07T19:42:58.3599076Z cpuid level : 13 2025-05-07T19:42:58.3599172Z wp : yes 2025-05-07T19:42:58.3601185Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3601549Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3601648Z bogomips : 6000.01 2025-05-07T19:42:58.3601731Z clflush size : 64 2025-05-07T19:42:58.3601812Z cache_alignment : 64 2025-05-07T19:42:58.3601950Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3602081Z power management: 2025-05-07T19:42:58.3602085Z 2025-05-07T19:42:58.3602165Z processor : 56 2025-05-07T19:42:58.3602252Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3602347Z cpu family : 6 2025-05-07T19:42:58.3602427Z model : 85 2025-05-07T19:42:58.3602580Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3602677Z stepping : 7 2025-05-07T19:42:58.3602758Z microcode : 0x5003901 2025-05-07T19:42:58.3602839Z cpu MHz : 3000.006 2025-05-07T19:42:58.3602916Z cache size : 36608 KB 2025-05-07T19:42:58.3603009Z physical id : 0 2025-05-07T19:42:58.3603086Z siblings : 48 2025-05-07T19:42:58.3603162Z core id : 8 2025-05-07T19:42:58.3603247Z cpu cores : 24 2025-05-07T19:42:58.3603324Z apicid : 17 2025-05-07T19:42:58.3603454Z initial apicid : 17 2025-05-07T19:42:58.3603533Z fpu : yes 2025-05-07T19:42:58.3603627Z fpu_exception : yes 2025-05-07T19:42:58.3603707Z cpuid level : 13 2025-05-07T19:42:58.3603783Z wp : yes 2025-05-07T19:42:58.3605806Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3606165Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3606243Z bogomips : 6000.01 2025-05-07T19:42:58.3606342Z clflush size : 64 2025-05-07T19:42:58.3606420Z cache_alignment : 64 2025-05-07T19:42:58.3606541Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3606630Z power management: 2025-05-07T19:42:58.3606637Z 2025-05-07T19:42:58.3606710Z processor : 57 2025-05-07T19:42:58.3606792Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3606873Z cpu family : 6 2025-05-07T19:42:58.3606955Z model : 85 2025-05-07T19:42:58.3607101Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3607175Z stepping : 7 2025-05-07T19:42:58.3607265Z microcode : 0x5003901 2025-05-07T19:42:58.3607343Z cpu MHz : 1200.863 2025-05-07T19:42:58.3607424Z cache size : 36608 KB 2025-05-07T19:42:58.3607506Z physical id : 0 2025-05-07T19:42:58.3607598Z siblings : 48 2025-05-07T19:42:58.3607673Z core id : 9 2025-05-07T19:42:58.3607752Z cpu cores : 24 2025-05-07T19:42:58.3607838Z apicid : 19 2025-05-07T19:42:58.3607918Z initial apicid : 19 2025-05-07T19:42:58.3607991Z fpu : yes 2025-05-07T19:42:58.3608075Z fpu_exception : yes 2025-05-07T19:42:58.3608161Z cpuid level : 13 2025-05-07T19:42:58.3608236Z wp : yes 2025-05-07T19:42:58.3610247Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3610634Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3610714Z bogomips : 6000.01 2025-05-07T19:42:58.3610794Z clflush size : 64 2025-05-07T19:42:58.3610900Z cache_alignment : 64 2025-05-07T19:42:58.3611022Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3611103Z power management: 2025-05-07T19:42:58.3611107Z 2025-05-07T19:42:58.3611258Z processor : 58 2025-05-07T19:42:58.3611348Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3611428Z cpu family : 6 2025-05-07T19:42:58.3611508Z model : 85 2025-05-07T19:42:58.3611674Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3611750Z stepping : 7 2025-05-07T19:42:58.3611835Z microcode : 0x5003901 2025-05-07T19:42:58.3611921Z cpu MHz : 1197.710 2025-05-07T19:42:58.3612004Z cache size : 36608 KB 2025-05-07T19:42:58.3612084Z physical id : 0 2025-05-07T19:42:58.3612160Z siblings : 48 2025-05-07T19:42:58.3612252Z core id : 10 2025-05-07T19:42:58.3612333Z cpu cores : 24 2025-05-07T19:42:58.3612409Z apicid : 21 2025-05-07T19:42:58.3612491Z initial apicid : 21 2025-05-07T19:42:58.3612582Z fpu : yes 2025-05-07T19:42:58.3612665Z fpu_exception : yes 2025-05-07T19:42:58.3612793Z cpuid level : 13 2025-05-07T19:42:58.3612884Z wp : yes 2025-05-07T19:42:58.3614966Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3615512Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3615612Z bogomips : 6000.01 2025-05-07T19:42:58.3615698Z clflush size : 64 2025-05-07T19:42:58.3615783Z cache_alignment : 64 2025-05-07T19:42:58.3615938Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3616028Z power management: 2025-05-07T19:42:58.3616032Z 2025-05-07T19:42:58.3616118Z processor : 59 2025-05-07T19:42:58.3616298Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3616378Z cpu family : 6 2025-05-07T19:42:58.3616460Z model : 85 2025-05-07T19:42:58.3616619Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3616717Z stepping : 7 2025-05-07T19:42:58.3616806Z microcode : 0x5003901 2025-05-07T19:42:58.3616883Z cpu MHz : 1199.380 2025-05-07T19:42:58.3616979Z cache size : 36608 KB 2025-05-07T19:42:58.3617059Z physical id : 0 2025-05-07T19:42:58.3617138Z siblings : 48 2025-05-07T19:42:58.3617219Z core id : 11 2025-05-07T19:42:58.3617314Z cpu cores : 24 2025-05-07T19:42:58.3617396Z apicid : 23 2025-05-07T19:42:58.3617485Z initial apicid : 23 2025-05-07T19:42:58.3617564Z fpu : yes 2025-05-07T19:42:58.3617664Z fpu_exception : yes 2025-05-07T19:42:58.3617747Z cpuid level : 13 2025-05-07T19:42:58.3617841Z wp : yes 2025-05-07T19:42:58.3620013Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3620405Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3620498Z bogomips : 6000.01 2025-05-07T19:42:58.3620578Z clflush size : 64 2025-05-07T19:42:58.3620661Z cache_alignment : 64 2025-05-07T19:42:58.3620799Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3620886Z power management: 2025-05-07T19:42:58.3620890Z 2025-05-07T19:42:58.3620978Z processor : 60 2025-05-07T19:42:58.3621068Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3621231Z cpu family : 6 2025-05-07T19:42:58.3621307Z model : 85 2025-05-07T19:42:58.3621464Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3621549Z stepping : 7 2025-05-07T19:42:58.3621631Z microcode : 0x5003901 2025-05-07T19:42:58.3621716Z cpu MHz : 3000.006 2025-05-07T19:42:58.3621797Z cache size : 36608 KB 2025-05-07T19:42:58.3621887Z physical id : 0 2025-05-07T19:42:58.3621966Z siblings : 48 2025-05-07T19:42:58.3622052Z core id : 12 2025-05-07T19:42:58.3622146Z cpu cores : 24 2025-05-07T19:42:58.3622224Z apicid : 25 2025-05-07T19:42:58.3622309Z initial apicid : 25 2025-05-07T19:42:58.3622388Z fpu : yes 2025-05-07T19:42:58.3622482Z fpu_exception : yes 2025-05-07T19:42:58.3622572Z cpuid level : 13 2025-05-07T19:42:58.3622653Z wp : yes 2025-05-07T19:42:58.3624876Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3625269Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3625354Z bogomips : 6000.01 2025-05-07T19:42:58.3625439Z clflush size : 64 2025-05-07T19:42:58.3625524Z cache_alignment : 64 2025-05-07T19:42:58.3625653Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3625751Z power management: 2025-05-07T19:42:58.3625756Z 2025-05-07T19:42:58.3625841Z processor : 61 2025-05-07T19:42:58.3625931Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3626012Z cpu family : 6 2025-05-07T19:42:58.3626113Z model : 85 2025-05-07T19:42:58.3626280Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3626360Z stepping : 7 2025-05-07T19:42:58.3626465Z microcode : 0x5003901 2025-05-07T19:42:58.3626550Z cpu MHz : 3000.006 2025-05-07T19:42:58.3626633Z cache size : 36608 KB 2025-05-07T19:42:58.3626714Z physical id : 0 2025-05-07T19:42:58.3626804Z siblings : 48 2025-05-07T19:42:58.3626884Z core id : 13 2025-05-07T19:42:58.3626966Z cpu cores : 24 2025-05-07T19:42:58.3627064Z apicid : 27 2025-05-07T19:42:58.3627152Z initial apicid : 27 2025-05-07T19:42:58.3627230Z fpu : yes 2025-05-07T19:42:58.3627314Z fpu_exception : yes 2025-05-07T19:42:58.3627404Z cpuid level : 13 2025-05-07T19:42:58.3627482Z wp : yes 2025-05-07T19:42:58.3629605Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3629979Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3630060Z bogomips : 6000.01 2025-05-07T19:42:58.3630135Z clflush size : 64 2025-05-07T19:42:58.3630216Z cache_alignment : 64 2025-05-07T19:42:58.3630333Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3630412Z power management: 2025-05-07T19:42:58.3630416Z 2025-05-07T19:42:58.3630503Z processor : 62 2025-05-07T19:42:58.3630581Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3630654Z cpu family : 6 2025-05-07T19:42:58.3630725Z model : 85 2025-05-07T19:42:58.3630881Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3631027Z stepping : 7 2025-05-07T19:42:58.3631106Z microcode : 0x5003901 2025-05-07T19:42:58.3631199Z cpu MHz : 3000.006 2025-05-07T19:42:58.3631278Z cache size : 36608 KB 2025-05-07T19:42:58.3631357Z physical id : 0 2025-05-07T19:42:58.3631434Z siblings : 48 2025-05-07T19:42:58.3631522Z core id : 14 2025-05-07T19:42:58.3631596Z cpu cores : 24 2025-05-07T19:42:58.3631672Z apicid : 29 2025-05-07T19:42:58.3631766Z initial apicid : 29 2025-05-07T19:42:58.3631841Z fpu : yes 2025-05-07T19:42:58.3631926Z fpu_exception : yes 2025-05-07T19:42:58.3632002Z cpuid level : 13 2025-05-07T19:42:58.3632087Z wp : yes 2025-05-07T19:42:58.3634148Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3634524Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3634605Z bogomips : 6000.01 2025-05-07T19:42:58.3634685Z clflush size : 64 2025-05-07T19:42:58.3634768Z cache_alignment : 64 2025-05-07T19:42:58.3634902Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3634991Z power management: 2025-05-07T19:42:58.3634995Z 2025-05-07T19:42:58.3635070Z processor : 63 2025-05-07T19:42:58.3635170Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3635250Z cpu family : 6 2025-05-07T19:42:58.3635326Z model : 85 2025-05-07T19:42:58.3635473Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3635568Z stepping : 7 2025-05-07T19:42:58.3635650Z microcode : 0x5003901 2025-05-07T19:42:58.3635726Z cpu MHz : 1199.777 2025-05-07T19:42:58.3635816Z cache size : 36608 KB 2025-05-07T19:42:58.3635899Z physical id : 0 2025-05-07T19:42:58.3635980Z siblings : 48 2025-05-07T19:42:58.3636055Z core id : 15 2025-05-07T19:42:58.3636144Z cpu cores : 24 2025-05-07T19:42:58.3636221Z apicid : 31 2025-05-07T19:42:58.3636300Z initial apicid : 31 2025-05-07T19:42:58.3636386Z fpu : yes 2025-05-07T19:42:58.3636470Z fpu_exception : yes 2025-05-07T19:42:58.3636553Z cpuid level : 13 2025-05-07T19:42:58.3636622Z wp : yes 2025-05-07T19:42:58.3638666Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3639026Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3639123Z bogomips : 6000.01 2025-05-07T19:42:58.3639204Z clflush size : 64 2025-05-07T19:42:58.3639287Z cache_alignment : 64 2025-05-07T19:42:58.3639409Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3639495Z power management: 2025-05-07T19:42:58.3639499Z 2025-05-07T19:42:58.3639570Z processor : 64 2025-05-07T19:42:58.3639653Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3639745Z cpu family : 6 2025-05-07T19:42:58.3639822Z model : 85 2025-05-07T19:42:58.3639970Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3640044Z stepping : 7 2025-05-07T19:42:58.3640190Z microcode : 0x5003901 2025-05-07T19:42:58.3640268Z cpu MHz : 3000.006 2025-05-07T19:42:58.3640344Z cache size : 36608 KB 2025-05-07T19:42:58.3640433Z physical id : 0 2025-05-07T19:42:58.3640510Z siblings : 48 2025-05-07T19:42:58.3640584Z core id : 16 2025-05-07T19:42:58.3640657Z cpu cores : 24 2025-05-07T19:42:58.3640751Z apicid : 33 2025-05-07T19:42:58.3640830Z initial apicid : 33 2025-05-07T19:42:58.3640901Z fpu : yes 2025-05-07T19:42:58.3640984Z fpu_exception : yes 2025-05-07T19:42:58.3641075Z cpuid level : 13 2025-05-07T19:42:58.3641156Z wp : yes 2025-05-07T19:42:58.3643199Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3643573Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3643649Z bogomips : 6000.01 2025-05-07T19:42:58.3643726Z clflush size : 64 2025-05-07T19:42:58.3643815Z cache_alignment : 64 2025-05-07T19:42:58.3643947Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3644024Z power management: 2025-05-07T19:42:58.3644028Z 2025-05-07T19:42:58.3644112Z processor : 65 2025-05-07T19:42:58.3644195Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3644269Z cpu family : 6 2025-05-07T19:42:58.3644358Z model : 85 2025-05-07T19:42:58.3644507Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3644585Z stepping : 7 2025-05-07T19:42:58.3644665Z microcode : 0x5003901 2025-05-07T19:42:58.3644762Z cpu MHz : 3000.006 2025-05-07T19:42:58.3644846Z cache size : 36608 KB 2025-05-07T19:42:58.3644923Z physical id : 0 2025-05-07T19:42:58.3645005Z siblings : 48 2025-05-07T19:42:58.3645092Z core id : 17 2025-05-07T19:42:58.3645171Z cpu cores : 24 2025-05-07T19:42:58.3645243Z apicid : 35 2025-05-07T19:42:58.3645343Z initial apicid : 35 2025-05-07T19:42:58.3645418Z fpu : yes 2025-05-07T19:42:58.3645498Z fpu_exception : yes 2025-05-07T19:42:58.3645574Z cpuid level : 13 2025-05-07T19:42:58.3645658Z wp : yes 2025-05-07T19:42:58.3647674Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3648044Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3648125Z bogomips : 6000.01 2025-05-07T19:42:58.3648203Z clflush size : 64 2025-05-07T19:42:58.3648282Z cache_alignment : 64 2025-05-07T19:42:58.3648417Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3648497Z power management: 2025-05-07T19:42:58.3648501Z 2025-05-07T19:42:58.3648579Z processor : 66 2025-05-07T19:42:58.3648667Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3648740Z cpu family : 6 2025-05-07T19:42:58.3648810Z model : 85 2025-05-07T19:42:58.3648957Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3649050Z stepping : 7 2025-05-07T19:42:58.3649132Z microcode : 0x5003901 2025-05-07T19:42:58.3649210Z cpu MHz : 1199.400 2025-05-07T19:42:58.3649604Z cache size : 36608 KB 2025-05-07T19:42:58.3649689Z physical id : 0 2025-05-07T19:42:58.3649763Z siblings : 48 2025-05-07T19:42:58.3649837Z core id : 18 2025-05-07T19:42:58.3649929Z cpu cores : 24 2025-05-07T19:42:58.3650010Z apicid : 37 2025-05-07T19:42:58.3650088Z initial apicid : 37 2025-05-07T19:42:58.3650173Z fpu : yes 2025-05-07T19:42:58.3650253Z fpu_exception : yes 2025-05-07T19:42:58.3650329Z cpuid level : 13 2025-05-07T19:42:58.3650399Z wp : yes 2025-05-07T19:42:58.3652453Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3652823Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3652922Z bogomips : 6000.01 2025-05-07T19:42:58.3653000Z clflush size : 64 2025-05-07T19:42:58.3653079Z cache_alignment : 64 2025-05-07T19:42:58.3653198Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3653292Z power management: 2025-05-07T19:42:58.3653296Z 2025-05-07T19:42:58.3653377Z processor : 67 2025-05-07T19:42:58.3653462Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3653541Z cpu family : 6 2025-05-07T19:42:58.3653611Z model : 85 2025-05-07T19:42:58.3653910Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3653990Z stepping : 7 2025-05-07T19:42:58.3654077Z microcode : 0x5003901 2025-05-07T19:42:58.3654160Z cpu MHz : 3000.006 2025-05-07T19:42:58.3654240Z cache size : 36608 KB 2025-05-07T19:42:58.3654337Z physical id : 0 2025-05-07T19:42:58.3654411Z siblings : 48 2025-05-07T19:42:58.3654487Z core id : 19 2025-05-07T19:42:58.3654563Z cpu cores : 24 2025-05-07T19:42:58.3654653Z apicid : 39 2025-05-07T19:42:58.3654740Z initial apicid : 39 2025-05-07T19:42:58.3654880Z fpu : yes 2025-05-07T19:42:58.3654991Z fpu_exception : yes 2025-05-07T19:42:58.3655070Z cpuid level : 13 2025-05-07T19:42:58.3655149Z wp : yes 2025-05-07T19:42:58.3657502Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3657894Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3657977Z bogomips : 6000.01 2025-05-07T19:42:58.3658073Z clflush size : 64 2025-05-07T19:42:58.3658161Z cache_alignment : 64 2025-05-07T19:42:58.3658295Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3658382Z power management: 2025-05-07T19:42:58.3658387Z 2025-05-07T19:42:58.3658477Z processor : 68 2025-05-07T19:42:58.3658568Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3658649Z cpu family : 6 2025-05-07T19:42:58.3658737Z model : 85 2025-05-07T19:42:58.3658898Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3658980Z stepping : 7 2025-05-07T19:42:58.3659066Z microcode : 0x5003901 2025-05-07T19:42:58.3659158Z cpu MHz : 1198.550 2025-05-07T19:42:58.3659246Z cache size : 36608 KB 2025-05-07T19:42:58.3659329Z physical id : 0 2025-05-07T19:42:58.3659421Z siblings : 48 2025-05-07T19:42:58.3659557Z core id : 20 2025-05-07T19:42:58.3659643Z cpu cores : 24 2025-05-07T19:42:58.3659720Z apicid : 41 2025-05-07T19:42:58.3659811Z initial apicid : 41 2025-05-07T19:42:58.3659887Z fpu : yes 2025-05-07T19:42:58.3659973Z fpu_exception : yes 2025-05-07T19:42:58.3660062Z cpuid level : 13 2025-05-07T19:42:58.3660140Z wp : yes 2025-05-07T19:42:58.3662426Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3662831Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3662917Z bogomips : 6000.01 2025-05-07T19:42:58.3662999Z clflush size : 64 2025-05-07T19:42:58.3663100Z cache_alignment : 64 2025-05-07T19:42:58.3663229Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3663312Z power management: 2025-05-07T19:42:58.3663316Z 2025-05-07T19:42:58.3663397Z processor : 69 2025-05-07T19:42:58.3663496Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3663579Z cpu family : 6 2025-05-07T19:42:58.3663659Z model : 85 2025-05-07T19:42:58.3663832Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3663909Z stepping : 7 2025-05-07T19:42:58.3663993Z microcode : 0x5003901 2025-05-07T19:42:58.3664076Z cpu MHz : 1199.103 2025-05-07T19:42:58.3664177Z cache size : 36608 KB 2025-05-07T19:42:58.3664256Z physical id : 0 2025-05-07T19:42:58.3664335Z siblings : 48 2025-05-07T19:42:58.3664425Z core id : 21 2025-05-07T19:42:58.3664510Z cpu cores : 24 2025-05-07T19:42:58.3664590Z apicid : 43 2025-05-07T19:42:58.3664673Z initial apicid : 43 2025-05-07T19:42:58.3664762Z fpu : yes 2025-05-07T19:42:58.3664845Z fpu_exception : yes 2025-05-07T19:42:58.3664926Z cpuid level : 13 2025-05-07T19:42:58.3664998Z wp : yes 2025-05-07T19:42:58.3667188Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3667678Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3667768Z bogomips : 6000.01 2025-05-07T19:42:58.3667850Z clflush size : 64 2025-05-07T19:42:58.3667925Z cache_alignment : 64 2025-05-07T19:42:58.3668048Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3668128Z power management: 2025-05-07T19:42:58.3668132Z 2025-05-07T19:42:58.3668203Z processor : 70 2025-05-07T19:42:58.3668282Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3668372Z cpu family : 6 2025-05-07T19:42:58.3668444Z model : 85 2025-05-07T19:42:58.3668588Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3668667Z stepping : 7 2025-05-07T19:42:58.3668744Z microcode : 0x5003901 2025-05-07T19:42:58.3668820Z cpu MHz : 3000.006 2025-05-07T19:42:58.3668912Z cache size : 36608 KB 2025-05-07T19:42:58.3669027Z physical id : 0 2025-05-07T19:42:58.3669112Z siblings : 48 2025-05-07T19:42:58.3669201Z core id : 22 2025-05-07T19:42:58.3669317Z cpu cores : 24 2025-05-07T19:42:58.3669498Z apicid : 45 2025-05-07T19:42:58.3669591Z initial apicid : 45 2025-05-07T19:42:58.3669678Z fpu : yes 2025-05-07T19:42:58.3669794Z fpu_exception : yes 2025-05-07T19:42:58.3669888Z cpuid level : 13 2025-05-07T19:42:58.3669973Z wp : yes 2025-05-07T19:42:58.3672051Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3672425Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3672521Z bogomips : 6000.01 2025-05-07T19:42:58.3672636Z clflush size : 64 2025-05-07T19:42:58.3672729Z cache_alignment : 64 2025-05-07T19:42:58.3672864Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3672985Z power management: 2025-05-07T19:42:58.3672989Z 2025-05-07T19:42:58.3673078Z processor : 71 2025-05-07T19:42:58.3673170Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3673258Z cpu family : 6 2025-05-07T19:42:58.3673371Z model : 85 2025-05-07T19:42:58.3673531Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3673618Z stepping : 7 2025-05-07T19:42:58.3673736Z microcode : 0x5003901 2025-05-07T19:42:58.3673827Z cpu MHz : 1204.352 2025-05-07T19:42:58.3673919Z cache size : 36608 KB 2025-05-07T19:42:58.3674011Z physical id : 0 2025-05-07T19:42:58.3674121Z siblings : 48 2025-05-07T19:42:58.3674206Z core id : 23 2025-05-07T19:42:58.3674293Z cpu cores : 24 2025-05-07T19:42:58.3674378Z apicid : 47 2025-05-07T19:42:58.3674499Z initial apicid : 47 2025-05-07T19:42:58.3674583Z fpu : yes 2025-05-07T19:42:58.3674832Z fpu_exception : yes 2025-05-07T19:42:58.3674951Z cpuid level : 13 2025-05-07T19:42:58.3675209Z wp : yes 2025-05-07T19:42:58.3677399Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3677825Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3677929Z bogomips : 6000.01 2025-05-07T19:42:58.3678024Z clflush size : 64 2025-05-07T19:42:58.3678146Z cache_alignment : 64 2025-05-07T19:42:58.3678282Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3678379Z power management: 2025-05-07T19:42:58.3678383Z 2025-05-07T19:42:58.3678500Z processor : 72 2025-05-07T19:42:58.3678599Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3678691Z cpu family : 6 2025-05-07T19:42:58.3678775Z model : 85 2025-05-07T19:42:58.3678967Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3679056Z stepping : 7 2025-05-07T19:42:58.3679154Z microcode : 0x5003901 2025-05-07T19:42:58.3679245Z cpu MHz : 3291.290 2025-05-07T19:42:58.3679363Z cache size : 36608 KB 2025-05-07T19:42:58.3679457Z physical id : 1 2025-05-07T19:42:58.3679557Z siblings : 48 2025-05-07T19:42:58.3679671Z core id : 0 2025-05-07T19:42:58.3679762Z cpu cores : 24 2025-05-07T19:42:58.3679854Z apicid : 65 2025-05-07T19:42:58.3679977Z initial apicid : 65 2025-05-07T19:42:58.3680195Z fpu : yes 2025-05-07T19:42:58.3680292Z fpu_exception : yes 2025-05-07T19:42:58.3680388Z cpuid level : 13 2025-05-07T19:42:58.3680507Z wp : yes 2025-05-07T19:42:58.3682735Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3683348Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3683478Z bogomips : 6000.01 2025-05-07T19:42:58.3683577Z clflush size : 64 2025-05-07T19:42:58.3683677Z cache_alignment : 64 2025-05-07T19:42:58.3683848Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3683951Z power management: 2025-05-07T19:42:58.3683955Z 2025-05-07T19:42:58.3684048Z processor : 73 2025-05-07T19:42:58.3684175Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3684268Z cpu family : 6 2025-05-07T19:42:58.3684359Z model : 85 2025-05-07T19:42:58.3684540Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3684657Z stepping : 7 2025-05-07T19:42:58.3684758Z microcode : 0x5003901 2025-05-07T19:42:58.3684856Z cpu MHz : 3000.006 2025-05-07T19:42:58.3684954Z cache size : 36608 KB 2025-05-07T19:42:58.3685073Z physical id : 1 2025-05-07T19:42:58.3685164Z siblings : 48 2025-05-07T19:42:58.3685256Z core id : 1 2025-05-07T19:42:58.3685368Z cpu cores : 24 2025-05-07T19:42:58.3685459Z apicid : 67 2025-05-07T19:42:58.3685556Z initial apicid : 67 2025-05-07T19:42:58.3685646Z fpu : yes 2025-05-07T19:42:58.3685770Z fpu_exception : yes 2025-05-07T19:42:58.3685867Z cpuid level : 13 2025-05-07T19:42:58.3685957Z wp : yes 2025-05-07T19:42:58.3688171Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3688597Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3688699Z bogomips : 6000.01 2025-05-07T19:42:58.3688822Z clflush size : 64 2025-05-07T19:42:58.3688924Z cache_alignment : 64 2025-05-07T19:42:58.3689073Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3689197Z power management: 2025-05-07T19:42:58.3689201Z 2025-05-07T19:42:58.3689293Z processor : 74 2025-05-07T19:42:58.3689393Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3689494Z cpu family : 6 2025-05-07T19:42:58.3689612Z model : 85 2025-05-07T19:42:58.3689791Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3689885Z stepping : 7 2025-05-07T19:42:58.3690004Z microcode : 0x5003901 2025-05-07T19:42:58.3690105Z cpu MHz : 3000.006 2025-05-07T19:42:58.3690202Z cache size : 36608 KB 2025-05-07T19:42:58.3690297Z physical id : 1 2025-05-07T19:42:58.3690401Z siblings : 48 2025-05-07T19:42:58.3690476Z core id : 2 2025-05-07T19:42:58.3690560Z cpu cores : 24 2025-05-07T19:42:58.3690651Z apicid : 69 2025-05-07T19:42:58.3690738Z initial apicid : 69 2025-05-07T19:42:58.3690816Z fpu : yes 2025-05-07T19:42:58.3690902Z fpu_exception : yes 2025-05-07T19:42:58.3690997Z cpuid level : 13 2025-05-07T19:42:58.3691131Z wp : yes 2025-05-07T19:42:58.3693405Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3694005Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3694089Z bogomips : 6000.01 2025-05-07T19:42:58.3694169Z clflush size : 64 2025-05-07T19:42:58.3694259Z cache_alignment : 64 2025-05-07T19:42:58.3694384Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3694471Z power management: 2025-05-07T19:42:58.3694475Z 2025-05-07T19:42:58.3694600Z processor : 75 2025-05-07T19:42:58.3694682Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3694761Z cpu family : 6 2025-05-07T19:42:58.3694908Z model : 85 2025-05-07T19:42:58.3695082Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3695165Z stepping : 7 2025-05-07T19:42:58.3695248Z microcode : 0x5003901 2025-05-07T19:42:58.3695505Z cpu MHz : 3000.006 2025-05-07T19:42:58.3695593Z cache size : 36608 KB 2025-05-07T19:42:58.3695674Z physical id : 1 2025-05-07T19:42:58.3695751Z siblings : 48 2025-05-07T19:42:58.3695836Z core id : 3 2025-05-07T19:42:58.3695917Z cpu cores : 24 2025-05-07T19:42:58.3695992Z apicid : 71 2025-05-07T19:42:58.3696088Z initial apicid : 71 2025-05-07T19:42:58.3696171Z fpu : yes 2025-05-07T19:42:58.3696288Z fpu_exception : yes 2025-05-07T19:42:58.3696393Z cpuid level : 13 2025-05-07T19:42:58.3696482Z wp : yes 2025-05-07T19:42:58.3698653Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3699045Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3699132Z bogomips : 6000.01 2025-05-07T19:42:58.3699212Z clflush size : 64 2025-05-07T19:42:58.3699298Z cache_alignment : 64 2025-05-07T19:42:58.3699441Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3699526Z power management: 2025-05-07T19:42:58.3699530Z 2025-05-07T19:42:58.3699606Z processor : 76 2025-05-07T19:42:58.3699700Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3699777Z cpu family : 6 2025-05-07T19:42:58.3699851Z model : 85 2025-05-07T19:42:58.3700007Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3700093Z stepping : 7 2025-05-07T19:42:58.3700173Z microcode : 0x5003901 2025-05-07T19:42:58.3700250Z cpu MHz : 3128.385 2025-05-07T19:42:58.3700337Z cache size : 36608 KB 2025-05-07T19:42:58.3700415Z physical id : 1 2025-05-07T19:42:58.3700489Z siblings : 48 2025-05-07T19:42:58.3700565Z core id : 4 2025-05-07T19:42:58.3700653Z cpu cores : 24 2025-05-07T19:42:58.3700729Z apicid : 73 2025-05-07T19:42:58.3700808Z initial apicid : 73 2025-05-07T19:42:58.3700899Z fpu : yes 2025-05-07T19:42:58.3700979Z fpu_exception : yes 2025-05-07T19:42:58.3701054Z cpuid level : 13 2025-05-07T19:42:58.3701125Z wp : yes 2025-05-07T19:42:58.3703314Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3703759Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3703848Z bogomips : 6000.01 2025-05-07T19:42:58.3703990Z clflush size : 64 2025-05-07T19:42:58.3704075Z cache_alignment : 64 2025-05-07T19:42:58.3704200Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3704288Z power management: 2025-05-07T19:42:58.3704297Z 2025-05-07T19:42:58.3704378Z processor : 77 2025-05-07T19:42:58.3704468Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3704549Z cpu family : 6 2025-05-07T19:42:58.3704623Z model : 85 2025-05-07T19:42:58.3704779Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3704859Z stepping : 7 2025-05-07T19:42:58.3704949Z microcode : 0x5003901 2025-05-07T19:42:58.3705028Z cpu MHz : 3248.990 2025-05-07T19:42:58.3705106Z cache size : 36608 KB 2025-05-07T19:42:58.3705195Z physical id : 1 2025-05-07T19:42:58.3705276Z siblings : 48 2025-05-07T19:42:58.3705351Z core id : 5 2025-05-07T19:42:58.3705429Z cpu cores : 24 2025-05-07T19:42:58.3705516Z apicid : 75 2025-05-07T19:42:58.3705596Z initial apicid : 75 2025-05-07T19:42:58.3705672Z fpu : yes 2025-05-07T19:42:58.3705766Z fpu_exception : yes 2025-05-07T19:42:58.3705848Z cpuid level : 13 2025-05-07T19:42:58.3705924Z wp : yes 2025-05-07T19:42:58.3708114Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3708612Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3708694Z bogomips : 6000.01 2025-05-07T19:42:58.3708781Z clflush size : 64 2025-05-07T19:42:58.3708869Z cache_alignment : 64 2025-05-07T19:42:58.3708992Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3709073Z power management: 2025-05-07T19:42:58.3709079Z 2025-05-07T19:42:58.3709166Z processor : 78 2025-05-07T19:42:58.3709255Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3709335Z cpu family : 6 2025-05-07T19:42:58.3709421Z model : 85 2025-05-07T19:42:58.3709577Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3709659Z stepping : 7 2025-05-07T19:42:58.3709739Z microcode : 0x5003901 2025-05-07T19:42:58.3709823Z cpu MHz : 3177.990 2025-05-07T19:42:58.3709907Z cache size : 36608 KB 2025-05-07T19:42:58.3709988Z physical id : 1 2025-05-07T19:42:58.3710075Z siblings : 48 2025-05-07T19:42:58.3710152Z core id : 6 2025-05-07T19:42:58.3710230Z cpu cores : 24 2025-05-07T19:42:58.3710308Z apicid : 77 2025-05-07T19:42:58.3710400Z initial apicid : 77 2025-05-07T19:42:58.3710473Z fpu : yes 2025-05-07T19:42:58.3710554Z fpu_exception : yes 2025-05-07T19:42:58.3710644Z cpuid level : 13 2025-05-07T19:42:58.3710717Z wp : yes 2025-05-07T19:42:58.3712836Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3713277Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3713358Z bogomips : 6000.01 2025-05-07T19:42:58.3713434Z clflush size : 64 2025-05-07T19:42:58.3713640Z cache_alignment : 64 2025-05-07T19:42:58.3713802Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3713878Z power management: 2025-05-07T19:42:58.3713882Z 2025-05-07T19:42:58.3713958Z processor : 79 2025-05-07T19:42:58.3714049Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3714125Z cpu family : 6 2025-05-07T19:42:58.3714199Z model : 85 2025-05-07T19:42:58.3714360Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3714433Z stepping : 7 2025-05-07T19:42:58.3714508Z microcode : 0x5003901 2025-05-07T19:42:58.3714578Z cpu MHz : 3162.013 2025-05-07T19:42:58.3714667Z cache size : 36608 KB 2025-05-07T19:42:58.3714739Z physical id : 1 2025-05-07T19:42:58.3714815Z siblings : 48 2025-05-07T19:42:58.3714897Z core id : 7 2025-05-07T19:42:58.3714977Z cpu cores : 24 2025-05-07T19:42:58.3715046Z apicid : 79 2025-05-07T19:42:58.3715125Z initial apicid : 79 2025-05-07T19:42:58.3715199Z fpu : yes 2025-05-07T19:42:58.3715278Z fpu_exception : yes 2025-05-07T19:42:58.3715350Z cpuid level : 13 2025-05-07T19:42:58.3715419Z wp : yes 2025-05-07T19:42:58.3717440Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3717802Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3717888Z bogomips : 6000.01 2025-05-07T19:42:58.3717961Z clflush size : 64 2025-05-07T19:42:58.3718042Z cache_alignment : 64 2025-05-07T19:42:58.3718163Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3718243Z power management: 2025-05-07T19:42:58.3718247Z 2025-05-07T19:42:58.3718317Z processor : 80 2025-05-07T19:42:58.3718399Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3718478Z cpu family : 6 2025-05-07T19:42:58.3718555Z model : 85 2025-05-07T19:42:58.3718697Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3718784Z stepping : 7 2025-05-07T19:42:58.3718861Z microcode : 0x5003901 2025-05-07T19:42:58.3718932Z cpu MHz : 3000.006 2025-05-07T19:42:58.3719005Z cache size : 36608 KB 2025-05-07T19:42:58.3719093Z physical id : 1 2025-05-07T19:42:58.3719164Z siblings : 48 2025-05-07T19:42:58.3719232Z core id : 8 2025-05-07T19:42:58.3719303Z cpu cores : 24 2025-05-07T19:42:58.3719382Z apicid : 81 2025-05-07T19:42:58.3719458Z initial apicid : 81 2025-05-07T19:42:58.3719523Z fpu : yes 2025-05-07T19:42:58.3719604Z fpu_exception : yes 2025-05-07T19:42:58.3719679Z cpuid level : 13 2025-05-07T19:42:58.3719750Z wp : yes 2025-05-07T19:42:58.3721773Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3722181Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3722259Z bogomips : 6000.01 2025-05-07T19:42:58.3722337Z clflush size : 64 2025-05-07T19:42:58.3722413Z cache_alignment : 64 2025-05-07T19:42:58.3722537Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3722616Z power management: 2025-05-07T19:42:58.3722678Z 2025-05-07T19:42:58.3722752Z processor : 81 2025-05-07T19:42:58.3722834Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3722909Z cpu family : 6 2025-05-07T19:42:58.3722986Z model : 85 2025-05-07T19:42:58.3723133Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3723204Z stepping : 7 2025-05-07T19:42:58.3723296Z microcode : 0x5003901 2025-05-07T19:42:58.3723368Z cpu MHz : 3164.733 2025-05-07T19:42:58.3723442Z cache size : 36608 KB 2025-05-07T19:42:58.3723514Z physical id : 1 2025-05-07T19:42:58.3723593Z siblings : 48 2025-05-07T19:42:58.3723662Z core id : 9 2025-05-07T19:42:58.3723733Z cpu cores : 24 2025-05-07T19:42:58.3723806Z apicid : 83 2025-05-07T19:42:58.3723890Z initial apicid : 83 2025-05-07T19:42:58.3723959Z fpu : yes 2025-05-07T19:42:58.3724041Z fpu_exception : yes 2025-05-07T19:42:58.3724124Z cpuid level : 13 2025-05-07T19:42:58.3724192Z wp : yes 2025-05-07T19:42:58.3726206Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3726573Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3726647Z bogomips : 6000.01 2025-05-07T19:42:58.3726718Z clflush size : 64 2025-05-07T19:42:58.3726801Z cache_alignment : 64 2025-05-07T19:42:58.3726922Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3726999Z power management: 2025-05-07T19:42:58.3727003Z 2025-05-07T19:42:58.3727084Z processor : 82 2025-05-07T19:42:58.3727177Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3727248Z cpu family : 6 2025-05-07T19:42:58.3727319Z model : 85 2025-05-07T19:42:58.3727473Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3727550Z stepping : 7 2025-05-07T19:42:58.3727624Z microcode : 0x5003901 2025-05-07T19:42:58.3727697Z cpu MHz : 3189.707 2025-05-07T19:42:58.3727783Z cache size : 36608 KB 2025-05-07T19:42:58.3727858Z physical id : 1 2025-05-07T19:42:58.3727929Z siblings : 48 2025-05-07T19:42:58.3728008Z core id : 10 2025-05-07T19:42:58.3728079Z cpu cores : 24 2025-05-07T19:42:58.3728153Z apicid : 85 2025-05-07T19:42:58.3728228Z initial apicid : 85 2025-05-07T19:42:58.3728313Z fpu : yes 2025-05-07T19:42:58.3728388Z fpu_exception : yes 2025-05-07T19:42:58.3728459Z cpuid level : 13 2025-05-07T19:42:58.3728535Z wp : yes 2025-05-07T19:42:58.3730548Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3730948Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3731034Z bogomips : 6000.01 2025-05-07T19:42:58.3731109Z clflush size : 64 2025-05-07T19:42:58.3731183Z cache_alignment : 64 2025-05-07T19:42:58.3731306Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3731385Z power management: 2025-05-07T19:42:58.3731389Z 2025-05-07T19:42:58.3731465Z processor : 83 2025-05-07T19:42:58.3731592Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3731671Z cpu family : 6 2025-05-07T19:42:58.3731743Z model : 85 2025-05-07T19:42:58.3731888Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3731966Z stepping : 7 2025-05-07T19:42:58.3732048Z microcode : 0x5003901 2025-05-07T19:42:58.3732122Z cpu MHz : 3142.517 2025-05-07T19:42:58.3732200Z cache size : 36608 KB 2025-05-07T19:42:58.3732281Z physical id : 1 2025-05-07T19:42:58.3732350Z siblings : 48 2025-05-07T19:42:58.3732419Z core id : 11 2025-05-07T19:42:58.3732495Z cpu cores : 24 2025-05-07T19:42:58.3732565Z apicid : 87 2025-05-07T19:42:58.3732637Z initial apicid : 87 2025-05-07T19:42:58.3732709Z fpu : yes 2025-05-07T19:42:58.3732788Z fpu_exception : yes 2025-05-07T19:42:58.3732859Z cpuid level : 13 2025-05-07T19:42:58.3732930Z wp : yes 2025-05-07T19:42:58.3735015Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3735554Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3735634Z bogomips : 6000.01 2025-05-07T19:42:58.3735721Z clflush size : 64 2025-05-07T19:42:58.3735802Z cache_alignment : 64 2025-05-07T19:42:58.3735930Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3736022Z power management: 2025-05-07T19:42:58.3736026Z 2025-05-07T19:42:58.3736104Z processor : 84 2025-05-07T19:42:58.3736209Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3736287Z cpu family : 6 2025-05-07T19:42:58.3736367Z model : 85 2025-05-07T19:42:58.3736521Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3736600Z stepping : 7 2025-05-07T19:42:58.3736688Z microcode : 0x5003901 2025-05-07T19:42:58.3736769Z cpu MHz : 3121.541 2025-05-07T19:42:58.3736849Z cache size : 36608 KB 2025-05-07T19:42:58.3736924Z physical id : 1 2025-05-07T19:42:58.3737004Z siblings : 48 2025-05-07T19:42:58.3737078Z core id : 12 2025-05-07T19:42:58.3737153Z cpu cores : 24 2025-05-07T19:42:58.3737231Z apicid : 89 2025-05-07T19:42:58.3737314Z initial apicid : 89 2025-05-07T19:42:58.3737390Z fpu : yes 2025-05-07T19:42:58.3737474Z fpu_exception : yes 2025-05-07T19:42:58.3737558Z cpuid level : 13 2025-05-07T19:42:58.3737633Z wp : yes 2025-05-07T19:42:58.3739806Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3740261Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3740341Z bogomips : 6000.01 2025-05-07T19:42:58.3740421Z clflush size : 64 2025-05-07T19:42:58.3740511Z cache_alignment : 64 2025-05-07T19:42:58.3740638Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3740722Z power management: 2025-05-07T19:42:58.3740726Z 2025-05-07T19:42:58.3740815Z processor : 85 2025-05-07T19:42:58.3740901Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3740981Z cpu family : 6 2025-05-07T19:42:58.3741057Z model : 85 2025-05-07T19:42:58.3741290Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3741370Z stepping : 7 2025-05-07T19:42:58.3741453Z microcode : 0x5003901 2025-05-07T19:42:58.3741544Z cpu MHz : 3161.565 2025-05-07T19:42:58.3741634Z cache size : 36608 KB 2025-05-07T19:42:58.3741715Z physical id : 1 2025-05-07T19:42:58.3741792Z siblings : 48 2025-05-07T19:42:58.3741878Z core id : 13 2025-05-07T19:42:58.3741963Z cpu cores : 24 2025-05-07T19:42:58.3742039Z apicid : 91 2025-05-07T19:42:58.3742131Z initial apicid : 91 2025-05-07T19:42:58.3742210Z fpu : yes 2025-05-07T19:42:58.3742300Z fpu_exception : yes 2025-05-07T19:42:58.3742376Z cpuid level : 13 2025-05-07T19:42:58.3742456Z wp : yes 2025-05-07T19:42:58.3744623Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3745005Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3745092Z bogomips : 6000.01 2025-05-07T19:42:58.3745172Z clflush size : 64 2025-05-07T19:42:58.3745256Z cache_alignment : 64 2025-05-07T19:42:58.3745392Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3745476Z power management: 2025-05-07T19:42:58.3745480Z 2025-05-07T19:42:58.3745559Z processor : 86 2025-05-07T19:42:58.3745656Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3745735Z cpu family : 6 2025-05-07T19:42:58.3745815Z model : 85 2025-05-07T19:42:58.3745973Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3746066Z stepping : 7 2025-05-07T19:42:58.3746148Z microcode : 0x5003901 2025-05-07T19:42:58.3746225Z cpu MHz : 3160.069 2025-05-07T19:42:58.3746316Z cache size : 36608 KB 2025-05-07T19:42:58.3746399Z physical id : 1 2025-05-07T19:42:58.3746488Z siblings : 48 2025-05-07T19:42:58.3746561Z core id : 14 2025-05-07T19:42:58.3746651Z cpu cores : 24 2025-05-07T19:42:58.3746729Z apicid : 93 2025-05-07T19:42:58.3746810Z initial apicid : 93 2025-05-07T19:42:58.3746883Z fpu : yes 2025-05-07T19:42:58.3746976Z fpu_exception : yes 2025-05-07T19:42:58.3747057Z cpuid level : 13 2025-05-07T19:42:58.3747130Z wp : yes 2025-05-07T19:42:58.3749283Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3749695Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3749769Z bogomips : 6000.01 2025-05-07T19:42:58.3749854Z clflush size : 64 2025-05-07T19:42:58.3749933Z cache_alignment : 64 2025-05-07T19:42:58.3750048Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3750134Z power management: 2025-05-07T19:42:58.3750138Z 2025-05-07T19:42:58.3750211Z processor : 87 2025-05-07T19:42:58.3750323Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3750408Z cpu family : 6 2025-05-07T19:42:58.3750480Z model : 85 2025-05-07T19:42:58.3750625Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3750743Z stepping : 7 2025-05-07T19:42:58.3750837Z microcode : 0x5003901 2025-05-07T19:42:58.3750911Z cpu MHz : 3323.073 2025-05-07T19:42:58.3750984Z cache size : 36608 KB 2025-05-07T19:42:58.3751073Z physical id : 1 2025-05-07T19:42:58.3751150Z siblings : 48 2025-05-07T19:42:58.3751221Z core id : 15 2025-05-07T19:42:58.3751293Z cpu cores : 24 2025-05-07T19:42:58.3751379Z apicid : 95 2025-05-07T19:42:58.3751463Z initial apicid : 95 2025-05-07T19:42:58.3751533Z fpu : yes 2025-05-07T19:42:58.3751608Z fpu_exception : yes 2025-05-07T19:42:58.3751692Z cpuid level : 13 2025-05-07T19:42:58.3751764Z wp : yes 2025-05-07T19:42:58.3754114Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3754499Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3754579Z bogomips : 6000.01 2025-05-07T19:42:58.3754657Z clflush size : 64 2025-05-07T19:42:58.3754747Z cache_alignment : 64 2025-05-07T19:42:58.3754873Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3754951Z power management: 2025-05-07T19:42:58.3754955Z 2025-05-07T19:42:58.3755039Z processor : 88 2025-05-07T19:42:58.3755127Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3755201Z cpu family : 6 2025-05-07T19:42:58.3755272Z model : 85 2025-05-07T19:42:58.3755429Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3755507Z stepping : 7 2025-05-07T19:42:58.3755588Z microcode : 0x5003901 2025-05-07T19:42:58.3755670Z cpu MHz : 3082.968 2025-05-07T19:42:58.3755749Z cache size : 36608 KB 2025-05-07T19:42:58.3755827Z physical id : 1 2025-05-07T19:42:58.3755902Z siblings : 48 2025-05-07T19:42:58.3755986Z core id : 16 2025-05-07T19:42:58.3756061Z cpu cores : 24 2025-05-07T19:42:58.3756131Z apicid : 97 2025-05-07T19:42:58.3756217Z initial apicid : 97 2025-05-07T19:42:58.3756291Z fpu : yes 2025-05-07T19:42:58.3756373Z fpu_exception : yes 2025-05-07T19:42:58.3756448Z cpuid level : 13 2025-05-07T19:42:58.3756527Z wp : yes 2025-05-07T19:42:58.3758629Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3759009Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3759135Z bogomips : 6000.01 2025-05-07T19:42:58.3759212Z clflush size : 64 2025-05-07T19:42:58.3759301Z cache_alignment : 64 2025-05-07T19:42:58.3759438Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3759516Z power management: 2025-05-07T19:42:58.3759520Z 2025-05-07T19:42:58.3759602Z processor : 89 2025-05-07T19:42:58.3759694Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3759780Z cpu family : 6 2025-05-07T19:42:58.3759854Z model : 85 2025-05-07T19:42:58.3760013Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3760103Z stepping : 7 2025-05-07T19:42:58.3760184Z microcode : 0x5003901 2025-05-07T19:42:58.3760310Z cpu MHz : 3153.250 2025-05-07T19:42:58.3760396Z cache size : 36608 KB 2025-05-07T19:42:58.3760475Z physical id : 1 2025-05-07T19:42:58.3760552Z siblings : 48 2025-05-07T19:42:58.3760628Z core id : 17 2025-05-07T19:42:58.3760716Z cpu cores : 24 2025-05-07T19:42:58.3760789Z apicid : 99 2025-05-07T19:42:58.3760871Z initial apicid : 99 2025-05-07T19:42:58.3760958Z fpu : yes 2025-05-07T19:42:58.3761043Z fpu_exception : yes 2025-05-07T19:42:58.3761122Z cpuid level : 13 2025-05-07T19:42:58.3761195Z wp : yes 2025-05-07T19:42:58.3763325Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3763725Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3763810Z bogomips : 6000.01 2025-05-07T19:42:58.3763885Z clflush size : 64 2025-05-07T19:42:58.3763964Z cache_alignment : 64 2025-05-07T19:42:58.3764083Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3764172Z power management: 2025-05-07T19:42:58.3764176Z 2025-05-07T19:42:58.3764254Z processor : 90 2025-05-07T19:42:58.3764345Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3764433Z cpu family : 6 2025-05-07T19:42:58.3764507Z model : 85 2025-05-07T19:42:58.3764663Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3764739Z stepping : 7 2025-05-07T19:42:58.3764830Z microcode : 0x5003901 2025-05-07T19:42:58.3764910Z cpu MHz : 3790.927 2025-05-07T19:42:58.3764990Z cache size : 36608 KB 2025-05-07T19:42:58.3765081Z physical id : 1 2025-05-07T19:42:58.3765157Z siblings : 48 2025-05-07T19:42:58.3765235Z core id : 18 2025-05-07T19:42:58.3765313Z cpu cores : 24 2025-05-07T19:42:58.3765503Z apicid : 101 2025-05-07T19:42:58.3765586Z initial apicid : 101 2025-05-07T19:42:58.3765659Z fpu : yes 2025-05-07T19:42:58.3765740Z fpu_exception : yes 2025-05-07T19:42:58.3765814Z cpuid level : 13 2025-05-07T19:42:58.3765884Z wp : yes 2025-05-07T19:42:58.3767901Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3768261Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3768392Z bogomips : 6000.01 2025-05-07T19:42:58.3768480Z clflush size : 64 2025-05-07T19:42:58.3768556Z cache_alignment : 64 2025-05-07T19:42:58.3768672Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3768752Z power management: 2025-05-07T19:42:58.3768755Z 2025-05-07T19:42:58.3768837Z processor : 91 2025-05-07T19:42:58.3768917Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3768988Z cpu family : 6 2025-05-07T19:42:58.3769074Z model : 85 2025-05-07T19:42:58.3769219Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3769287Z stepping : 7 2025-05-07T19:42:58.3769364Z microcode : 0x5003901 2025-05-07T19:42:58.3769445Z cpu MHz : 3000.006 2025-05-07T19:42:58.3769521Z cache size : 36608 KB 2025-05-07T19:42:58.3770172Z physical id : 1 2025-05-07T19:42:58.3770267Z siblings : 48 2025-05-07T19:42:58.3770337Z core id : 19 2025-05-07T19:42:58.3770410Z cpu cores : 24 2025-05-07T19:42:58.3770482Z apicid : 103 2025-05-07T19:42:58.3770575Z initial apicid : 103 2025-05-07T19:42:58.3770651Z fpu : yes 2025-05-07T19:42:58.3770728Z fpu_exception : yes 2025-05-07T19:42:58.3770802Z cpuid level : 13 2025-05-07T19:42:58.3770879Z wp : yes 2025-05-07T19:42:58.3772889Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3773257Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3773331Z bogomips : 6000.01 2025-05-07T19:42:58.3773413Z clflush size : 64 2025-05-07T19:42:58.3773499Z cache_alignment : 64 2025-05-07T19:42:58.3773615Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3773692Z power management: 2025-05-07T19:42:58.3773695Z 2025-05-07T19:42:58.3773770Z processor : 92 2025-05-07T19:42:58.3773858Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3773931Z cpu family : 6 2025-05-07T19:42:58.3774005Z model : 85 2025-05-07T19:42:58.3774156Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3774232Z stepping : 7 2025-05-07T19:42:58.3774309Z microcode : 0x5003901 2025-05-07T19:42:58.3774387Z cpu MHz : 3000.006 2025-05-07T19:42:58.3774464Z cache size : 36608 KB 2025-05-07T19:42:58.3774541Z physical id : 1 2025-05-07T19:42:58.3774620Z siblings : 48 2025-05-07T19:42:58.3774893Z core id : 20 2025-05-07T19:42:58.3775137Z cpu cores : 24 2025-05-07T19:42:58.3775214Z apicid : 105 2025-05-07T19:42:58.3775301Z initial apicid : 105 2025-05-07T19:42:58.3775386Z fpu : yes 2025-05-07T19:42:58.3775476Z fpu_exception : yes 2025-05-07T19:42:58.3775559Z cpuid level : 13 2025-05-07T19:42:58.3775637Z wp : yes 2025-05-07T19:42:58.3777823Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3778210Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3778304Z bogomips : 6000.01 2025-05-07T19:42:58.3778384Z clflush size : 64 2025-05-07T19:42:58.3778576Z cache_alignment : 64 2025-05-07T19:42:58.3778707Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3778805Z power management: 2025-05-07T19:42:58.3778810Z 2025-05-07T19:42:58.3778898Z processor : 93 2025-05-07T19:42:58.3778990Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3779085Z cpu family : 6 2025-05-07T19:42:58.3779164Z model : 85 2025-05-07T19:42:58.3779327Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3779424Z stepping : 7 2025-05-07T19:42:58.3779510Z microcode : 0x5003901 2025-05-07T19:42:58.3779592Z cpu MHz : 3000.006 2025-05-07T19:42:58.3779677Z cache size : 36608 KB 2025-05-07T19:42:58.3779766Z physical id : 1 2025-05-07T19:42:58.3779841Z siblings : 48 2025-05-07T19:42:58.3779955Z core id : 21 2025-05-07T19:42:58.3780102Z cpu cores : 24 2025-05-07T19:42:58.3780193Z apicid : 107 2025-05-07T19:42:58.3780276Z initial apicid : 107 2025-05-07T19:42:58.3780352Z fpu : yes 2025-05-07T19:42:58.3780452Z fpu_exception : yes 2025-05-07T19:42:58.3780535Z cpuid level : 13 2025-05-07T19:42:58.3780612Z wp : yes 2025-05-07T19:42:58.3782802Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3783191Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3783276Z bogomips : 6000.01 2025-05-07T19:42:58.3783368Z clflush size : 64 2025-05-07T19:42:58.3783455Z cache_alignment : 64 2025-05-07T19:42:58.3783585Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3783674Z power management: 2025-05-07T19:42:58.3783688Z 2025-05-07T19:42:58.3783770Z processor : 94 2025-05-07T19:42:58.3783857Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3783938Z cpu family : 6 2025-05-07T19:42:58.3784029Z model : 85 2025-05-07T19:42:58.3784188Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3784272Z stepping : 7 2025-05-07T19:42:58.3784358Z microcode : 0x5003901 2025-05-07T19:42:58.3784452Z cpu MHz : 3000.006 2025-05-07T19:42:58.3784545Z cache size : 36608 KB 2025-05-07T19:42:58.3784626Z physical id : 1 2025-05-07T19:42:58.3784715Z siblings : 48 2025-05-07T19:42:58.3784798Z core id : 22 2025-05-07T19:42:58.3784882Z cpu cores : 24 2025-05-07T19:42:58.3784968Z apicid : 109 2025-05-07T19:42:58.3785063Z initial apicid : 109 2025-05-07T19:42:58.3785138Z fpu : yes 2025-05-07T19:42:58.3785224Z fpu_exception : yes 2025-05-07T19:42:58.3785317Z cpuid level : 13 2025-05-07T19:42:58.3785399Z wp : yes 2025-05-07T19:42:58.3787570Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3787974Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3788055Z bogomips : 6000.01 2025-05-07T19:42:58.3788246Z clflush size : 64 2025-05-07T19:42:58.3788447Z cache_alignment : 64 2025-05-07T19:42:58.3788573Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3788702Z power management: 2025-05-07T19:42:58.3788706Z 2025-05-07T19:42:58.3788778Z processor : 95 2025-05-07T19:42:58.3788874Z vendor_id : GenuineIntel 2025-05-07T19:42:58.3788949Z cpu family : 6 2025-05-07T19:42:58.3789020Z model : 85 2025-05-07T19:42:58.3789179Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:42:58.3789253Z stepping : 7 2025-05-07T19:42:58.3789332Z microcode : 0x5003901 2025-05-07T19:42:58.3789404Z cpu MHz : 3152.561 2025-05-07T19:42:58.3789493Z cache size : 36608 KB 2025-05-07T19:42:58.3789569Z physical id : 1 2025-05-07T19:42:58.3789648Z siblings : 48 2025-05-07T19:42:58.3789738Z core id : 23 2025-05-07T19:42:58.3789817Z cpu cores : 24 2025-05-07T19:42:58.3789889Z apicid : 111 2025-05-07T19:42:58.3790023Z initial apicid : 111 2025-05-07T19:42:58.3790107Z fpu : yes 2025-05-07T19:42:58.3790187Z fpu_exception : yes 2025-05-07T19:42:58.3790263Z cpuid level : 13 2025-05-07T19:42:58.3790342Z wp : yes 2025-05-07T19:42:58.3792343Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:42:58.3792701Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:42:58.3792786Z bogomips : 6000.01 2025-05-07T19:42:58.3792861Z clflush size : 64 2025-05-07T19:42:58.3792937Z cache_alignment : 64 2025-05-07T19:42:58.3793066Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:42:58.3793143Z power management: 2025-05-07T19:42:58.3793148Z 2025-05-07T19:42:58.3793151Z 2025-05-07T19:42:58.3793255Z ################################################################################ 2025-05-07T19:42:58.3793356Z [INFO] Print PCI info ... 2025-05-07T19:42:58.3793430Z + lspci -v 2025-05-07T19:42:58.3793435Z 2025-05-07T19:42:58.3793601Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] 2025-05-07T19:42:58.3793699Z Subsystem: Amazon.com, Inc. Device 1237 2025-05-07T19:42:58.3793816Z Flags: bus master, medium devsel, latency 0 2025-05-07T19:42:58.3793821Z 2025-05-07T19:42:58.3794008Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 2025-05-07T19:42:58.3794085Z Physical Slot: 1 2025-05-07T19:42:58.3794204Z Flags: bus master, fast devsel, latency 0 2025-05-07T19:42:58.3794208Z 2025-05-07T19:42:58.3794444Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08) 2025-05-07T19:42:58.3794522Z Physical Slot: 1 2025-05-07T19:42:58.3794656Z Flags: bus master, fast devsel, latency 0, IRQ 9 2025-05-07T19:42:58.3794660Z 2025-05-07T19:42:58.3794918Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111 (prog-if 00 [VGA controller]) 2025-05-07T19:42:58.3794998Z Physical Slot: 3 2025-05-07T19:42:58.3795107Z Flags: bus master, fast devsel, latency 0 2025-05-07T19:42:58.3795235Z Memory at c0000000 (32-bit, prefetchable) [size=4M] 2025-05-07T19:42:58.3795347Z Expansion ROM at 000c0000 [disabled] [size=128K] 2025-05-07T19:42:58.3795351Z 2025-05-07T19:42:58.3795636Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller (prog-if 02 [NVM Express]) 2025-05-07T19:42:58.3795741Z Subsystem: Amazon.com, Inc. Device 0000 2025-05-07T19:42:58.3795821Z Physical Slot: 4 2025-05-07T19:42:58.3795949Z Flags: bus master, fast devsel, latency 0, IRQ 11 2025-05-07T19:42:58.3796099Z Memory at c0514000 (32-bit, non-prefetchable) [size=16K] 2025-05-07T19:42:58.3796264Z Capabilities: 2025-05-07T19:42:58.3796400Z Kernel driver in use: nvme 2025-05-07T19:42:58.3796404Z 2025-05-07T19:42:58.3796617Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA) 2025-05-07T19:42:58.3796691Z Physical Slot: 5 2025-05-07T19:42:58.3796791Z Flags: bus master, fast devsel, latency 0 2025-05-07T19:42:58.3796931Z Memory at c0510000 (32-bit, non-prefetchable) [size=16K] 2025-05-07T19:42:58.3797065Z Memory at c0400000 (32-bit, prefetchable) [size=1M] 2025-05-07T19:42:58.3797208Z Memory at c0500000 (32-bit, non-prefetchable) [size=64K] 2025-05-07T19:42:58.3797297Z Capabilities: 2025-05-07T19:42:58.3797393Z Kernel driver in use: ena 2025-05-07T19:42:58.3797397Z 2025-05-07T19:42:58.3797400Z 2025-05-07T19:42:58.3797553Z ################################################################################ 2025-05-07T19:42:58.3797659Z [INFO] Print Linux distribution info ... 2025-05-07T19:42:58.3797739Z + uname -a 2025-05-07T19:42:58.3797743Z 2025-05-07T19:42:58.3798111Z Linux 619e625bd71e 6.1.130-139.222.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Mar 11 01:10:58 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux 2025-05-07T19:42:58.3798119Z 2025-05-07T19:42:58.3798191Z + uname -m 2025-05-07T19:42:58.3798195Z 2025-05-07T19:42:58.3798273Z x86_64 2025-05-07T19:42:58.3798277Z 2025-05-07T19:42:58.3798362Z + cat /proc/version 2025-05-07T19:42:58.3798366Z 2025-05-07T19:42:58.3798927Z Linux version 6.1.130-139.222.amzn2023.x86_64 (mockbuild@ip-10-0-55-76) (gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5), GNU ld version 2.39-6.amzn2023.0.11) #1 SMP PREEMPT_DYNAMIC Tue Mar 11 01:10:58 UTC 2025 2025-05-07T19:42:58.3798932Z 2025-05-07T19:42:58.3799021Z + cat /etc/os-release 2025-05-07T19:42:58.3799025Z 2025-05-07T19:42:58.3799104Z NAME="Amazon Linux" 2025-05-07T19:42:58.3799179Z VERSION="2023" 2025-05-07T19:42:58.3799265Z ID="amzn" 2025-05-07T19:42:58.3799340Z ID_LIKE="fedora" 2025-05-07T19:42:58.3799419Z VERSION_ID="2023" 2025-05-07T19:42:58.3799512Z PLATFORM_ID="platform:al2023" 2025-05-07T19:42:58.3799632Z PRETTY_NAME="Amazon Linux 2023.7.20250428" 2025-05-07T19:42:58.3799723Z ANSI_COLOR="0;33" 2025-05-07T19:42:58.3799839Z CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2023" 2025-05-07T19:42:58.3800022Z HOME_URL="https://aws.amazon.com/linux/amazon-linux-2023/" 2025-05-07T19:42:58.3800176Z DOCUMENTATION_URL="https://docs.aws.amazon.com/linux/" 2025-05-07T19:42:58.3800327Z SUPPORT_URL="https://aws.amazon.com/premiumsupport/" 2025-05-07T19:42:58.3800511Z BUG_REPORT_URL="https://github.com/amazonlinux/amazon-linux-2023" 2025-05-07T19:42:58.3800607Z VENDOR_NAME="AWS" 2025-05-07T19:42:58.3800712Z VENDOR_URL="https://aws.amazon.com/" 2025-05-07T19:42:58.3800796Z SUPPORT_END="2029-06-30" 2025-05-07T19:42:58.3800800Z 2025-05-07T19:42:58.3835064Z ##[group]Run . $PRELUDE; print_gpu_info 2025-05-07T19:42:58.3835215Z . $PRELUDE; print_gpu_info 2025-05-07T19:42:58.3835521Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:42:58.3835599Z env: 2025-05-07T19:42:58.3835715Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:42:58.3835814Z BUILD_ENV: build_binary 2025-05-07T19:42:58.3835922Z BUILD_TARGET: genai 2025-05-07T19:42:58.3836003Z BUILD_VARIANT: cuda 2025-05-07T19:42:58.3836095Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:42:58.3836197Z ##[endgroup] 2025-05-07T19:42:58.7982109Z ################################################################################ 2025-05-07T19:42:58.7999980Z [INFO] Printing general display info ... 2025-05-07T19:42:58.8001311Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:42:58.8869704Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:42:58.8873383Z /usr/bin/sudo 2025-05-07T19:42:58.8884821Z which: no apt-get in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) 2025-05-07T19:42:58.8891087Z /usr/bin/yum 2025-05-07T19:42:58.8891971Z [INSTALL] Updating system repositories ... 2025-05-07T19:42:58.8914046Z [EXEC] [ATTEMPT 0/3] + sudo yum update -y 2025-05-07T19:42:59.1063714Z Last metadata expiration check: 0:00:18 ago on Wed May 7 19:42:41 2025. 2025-05-07T19:42:59.2018939Z Dependencies resolved. 2025-05-07T19:42:59.2235702Z Nothing to do. 2025-05-07T19:42:59.2236401Z Complete! 2025-05-07T19:42:59.2606294Z [INSTALL] Installing system package(s): hostname lshw ... 2025-05-07T19:42:59.2628984Z [EXEC] [ATTEMPT 0/3] + sudo yum install -y hostname lshw 2025-05-07T19:42:59.4900825Z Last metadata expiration check: 0:00:18 ago on Wed May 7 19:42:41 2025. 2025-05-07T19:42:59.5453795Z Dependencies resolved. 2025-05-07T19:42:59.5623165Z ================================================================================ 2025-05-07T19:42:59.5623739Z Package Arch Version Repository Size 2025-05-07T19:42:59.5624156Z ================================================================================ 2025-05-07T19:42:59.5624493Z Installing: 2025-05-07T19:42:59.5624808Z hostname x86_64 3.23-4.amzn2023.0.3 amazonlinux 28 k 2025-05-07T19:42:59.5625290Z lshw x86_64 B.02.19.2-7.amzn2023.0.3 amazonlinux 319 k 2025-05-07T19:42:59.5625590Z 2025-05-07T19:42:59.5625708Z Transaction Summary 2025-05-07T19:42:59.5625961Z ================================================================================ 2025-05-07T19:42:59.5626303Z Install 2 Packages 2025-05-07T19:42:59.5626438Z 2025-05-07T19:42:59.5626549Z Total download size: 347 k 2025-05-07T19:42:59.5626821Z Installed size: 883 k 2025-05-07T19:42:59.5627057Z Downloading Packages: 2025-05-07T19:42:59.8411418Z (1/2): hostname-3.23-4.amzn2023.0.3.x86_64.rpm 1.5 MB/s | 28 kB 00:00 2025-05-07T19:42:59.8486364Z (2/2): lshw-B.02.19.2-7.amzn2023.0.3.x86_64.rpm 12 MB/s | 319 kB 00:00 2025-05-07T19:42:59.8493971Z -------------------------------------------------------------------------------- 2025-05-07T19:42:59.8496943Z Total 1.2 MB/s | 347 kB 00:00 2025-05-07T19:42:59.8739936Z Running transaction check 2025-05-07T19:42:59.8796522Z Transaction check succeeded. 2025-05-07T19:42:59.8797052Z Running transaction test 2025-05-07T19:42:59.8960186Z Transaction test succeeded. 2025-05-07T19:42:59.8960618Z Running transaction 2025-05-07T19:42:59.9248211Z Preparing : 1/1 2025-05-07T19:42:59.9317506Z Installing : lshw-B.02.19.2-7.amzn2023.0.3.x86_64 1/2 2025-05-07T19:42:59.9348409Z Installing : hostname-3.23-4.amzn2023.0.3.x86_64 2/2 2025-05-07T19:43:00.9747698Z Running scriptlet: hostname-3.23-4.amzn2023.0.3.x86_64 2/2 2025-05-07T19:43:00.9748858Z Verifying : hostname-3.23-4.amzn2023.0.3.x86_64 1/2 2025-05-07T19:43:01.0124721Z Verifying : lshw-B.02.19.2-7.amzn2023.0.3.x86_64 2/2 2025-05-07T19:43:01.0125754Z 2025-05-07T19:43:01.0126061Z Installed: 2025-05-07T19:43:01.0127068Z hostname-3.23-4.amzn2023.0.3.x86_64 lshw-B.02.19.2-7.amzn2023.0.3.x86_64 2025-05-07T19:43:01.0128124Z 2025-05-07T19:43:01.0128370Z Complete! 2025-05-07T19:43:01.0558199Z + hostname 2025-05-07T19:43:01.0558668Z 2025-05-07T19:43:01.0562055Z 619e625bd71e 2025-05-07T19:43:01.0562658Z 2025-05-07T19:43:01.0563396Z + sudo lshw -C display 2025-05-07T19:43:01.0563624Z 2025-05-07T19:43:01.2527027Z *-display UNCLAIMED 2025-05-07T19:43:01.2527994Z description: VGA compatible controller 2025-05-07T19:43:01.2528967Z product: Amazon.com, Inc. 2025-05-07T19:43:01.2529277Z vendor: Amazon.com, Inc. 2025-05-07T19:43:01.2529593Z physical id: 3 2025-05-07T19:43:01.2529857Z bus info: pci@0000:00:03.0 2025-05-07T19:43:01.2530170Z version: 00 2025-05-07T19:43:01.2530424Z width: 32 bits 2025-05-07T19:43:01.2530811Z clock: 33MHz 2025-05-07T19:43:01.2531070Z capabilities: vga_controller bus_master 2025-05-07T19:43:01.2531426Z configuration: latency=0 2025-05-07T19:43:01.2531783Z resources: memory:c0000000-c03fffff memory:c0000-dffff 2025-05-07T19:43:01.2544896Z 2025-05-07T19:43:01.2545247Z ################################################################################ 2025-05-07T19:43:01.2545739Z [INFO] Printing NVIDIA GPU info ... 2025-05-07T19:43:01.2657683Z lspci: Unable to load libkmod resources: error -2 2025-05-07T19:43:01.2682083Z which: no nvidia-smi in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) 2025-05-07T19:43:01.2682631Z [CHECK] nvidia-smi not found 2025-05-07T19:43:01.2682975Z ################################################################################ 2025-05-07T19:43:01.2683335Z [INFO] Printing AMD GPU info ... 2025-05-07T19:43:01.2785820Z lspci: Unable to load libkmod resources: error -2 2025-05-07T19:43:01.2813614Z which: no rocminfo in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) 2025-05-07T19:43:01.2814155Z [CHECK] rocminfo not found 2025-05-07T19:43:01.2828386Z which: no rocm-smi in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) 2025-05-07T19:43:01.2828950Z [CHECK] rocm-smi not found 2025-05-07T19:43:01.2925340Z ##[group]Run . $PRELUDE; setup_miniconda $HOME/miniconda 2025-05-07T19:43:01.2925834Z . $PRELUDE; setup_miniconda $HOME/miniconda 2025-05-07T19:43:01.2926413Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:43:01.2926760Z env: 2025-05-07T19:43:01.2927024Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:43:01.2927338Z BUILD_ENV: build_binary 2025-05-07T19:43:01.2927612Z BUILD_TARGET: genai 2025-05-07T19:43:01.2927849Z BUILD_VARIANT: cuda 2025-05-07T19:43:01.2928120Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:43:01.2928385Z ##[endgroup] 2025-05-07T19:43:01.7165104Z ################################################################################ 2025-05-07T19:43:01.7165568Z # Setup Miniconda 2025-05-07T19:43:01.7165815Z # 2025-05-07T19:43:01.7178311Z # [2025-05-07T19:43:01.717Z] + setup_miniconda /github/home/miniconda 2025-05-07T19:43:01.7179581Z ################################################################################ 2025-05-07T19:43:01.7180413Z 2025-05-07T19:43:01.7196651Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:43:01.8013163Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:43:01.8014342Z + mkdir -p /github/home/miniconda 2025-05-07T19:43:01.8014584Z 2025-05-07T19:43:01.8024636Z 2025-05-07T19:43:01.8025327Z [SETUP] Downloading the Miniconda installer ... 2025-05-07T19:43:01.8047386Z [EXEC] [ATTEMPT 0/3] + wget -q https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh 2025-05-07T19:43:02.7888029Z [SETUP] Installing Miniconda ... 2025-05-07T19:43:02.7888574Z + bash miniconda.sh -b -p /github/home/miniconda -u 2025-05-07T19:43:02.7888857Z 2025-05-07T19:43:02.8030035Z PREFIX=/github/home/miniconda 2025-05-07T19:43:03.1536477Z Unpacking payload ... 2025-05-07T19:43:03.6369903Z entry_point.py:256: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior. 2025-05-07T19:43:04.3098733Z entry_point.py:256: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior. 2025-05-07T19:43:06.1651975Z 2025-05-07T19:43:06.1652691Z Installing base environment... 2025-05-07T19:43:06.1653337Z 2025-05-07T19:43:07.1521984Z Preparing transaction: ...working... done 2025-05-07T19:43:10.0220311Z Executing transaction: ...working... done 2025-05-07T19:43:10.5698408Z entry_point.py:256: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior. 2025-05-07T19:43:10.6385574Z installation finished. 2025-05-07T19:43:10.6396366Z 2025-05-07T19:43:10.6396717Z + rm -f miniconda.sh 2025-05-07T19:43:10.6396991Z 2025-05-07T19:43:10.6540097Z 2025-05-07T19:43:10.6540715Z [SETUP] Reloading the bash configuration ... 2025-05-07T19:43:10.6542519Z + /github/home/miniconda/bin/conda init bash 2025-05-07T19:43:10.6542755Z 2025-05-07T19:43:11.0157660Z no change /github/home/miniconda/condabin/conda 2025-05-07T19:43:11.0158841Z no change /github/home/miniconda/bin/conda 2025-05-07T19:43:11.0159885Z no change /github/home/miniconda/bin/conda-env 2025-05-07T19:43:11.0160972Z no change /github/home/miniconda/bin/activate 2025-05-07T19:43:11.0162015Z no change /github/home/miniconda/bin/deactivate 2025-05-07T19:43:11.0163198Z no change /github/home/miniconda/etc/profile.d/conda.sh 2025-05-07T19:43:11.0163885Z no change /github/home/miniconda/etc/fish/conf.d/conda.fish 2025-05-07T19:43:11.0164463Z no change /github/home/miniconda/shell/condabin/Conda.psm1 2025-05-07T19:43:11.0164902Z no change /github/home/miniconda/shell/condabin/conda-hook.ps1 2025-05-07T19:43:11.0165438Z no change /github/home/miniconda/lib/python3.13/site-packages/xontrib/conda.xsh 2025-05-07T19:43:11.0166285Z no change /github/home/miniconda/etc/profile.d/conda.csh 2025-05-07T19:43:11.0166666Z modified /github/home/.bashrc 2025-05-07T19:43:11.0166848Z 2025-05-07T19:43:11.0167068Z ==> For changes to take effect, close and re-open your current shell. <== 2025-05-07T19:43:11.0167364Z 2025-05-07T19:43:11.0691424Z 2025-05-07T19:43:11.0692154Z + . /github/home/.bashrc 2025-05-07T19:43:11.0692839Z 2025-05-07T19:43:11.8586837Z 2025-05-07T19:43:11.8587499Z [SETUP] Installing libmamba-solver (required since Anaconda 2024.02-1) and libarchive ... 2025-05-07T19:43:11.8614680Z [EXEC] [ATTEMPT 0/3] + conda install --solver=classic -c conda-forge --override-channels -y conda-libmamba-solver libmamba libmambapy libarchive 2025-05-07T19:43:23.4987992Z Collecting package metadata (current_repodata.json): - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / done 2025-05-07T19:43:24.9516250Z Solving environment: \ | / - \ | / - \ | / done 2025-05-07T19:43:25.0409953Z 2025-05-07T19:43:25.0410222Z ## Package Plan ## 2025-05-07T19:43:25.0410507Z 2025-05-07T19:43:25.0410723Z environment location: /github/home/miniconda 2025-05-07T19:43:25.0410970Z 2025-05-07T19:43:25.0411069Z added / updated specs: 2025-05-07T19:43:25.0411369Z - conda-libmamba-solver 2025-05-07T19:43:25.0411643Z - libarchive 2025-05-07T19:43:25.0411855Z - libmamba 2025-05-07T19:43:25.0412075Z - libmambapy 2025-05-07T19:43:25.0412205Z 2025-05-07T19:43:25.0412210Z 2025-05-07T19:43:25.0412335Z The following packages will be downloaded: 2025-05-07T19:43:25.0412572Z 2025-05-07T19:43:25.0412691Z package | build 2025-05-07T19:43:25.0413025Z ---------------------------|----------------- 2025-05-07T19:43:25.0413503Z ca-certificates-2025.4.26 | hbd8a1cb_0 149 KB conda-forge 2025-05-07T19:43:25.0414009Z certifi-2025.4.26 | pyhd8ed1ab_0 154 KB conda-forge 2025-05-07T19:43:25.0414486Z conda-25.3.1 | py313h78bf25f_1 1.1 MB conda-forge 2025-05-07T19:43:25.0415075Z conda-libmamba-solver-25.4.0| pyhd8ed1ab_0 41 KB conda-forge 2025-05-07T19:43:25.0415568Z ------------------------------------------------------------ 2025-05-07T19:43:25.0415983Z Total: 1.4 MB 2025-05-07T19:43:25.0416225Z 2025-05-07T19:43:25.0416344Z The following packages will be UPDATED: 2025-05-07T19:43:25.0416558Z 2025-05-07T19:43:25.0421217Z ca-certificates pkgs/main/linux-64::ca-certificates-2~ --> conda-forge/noarch::ca-certificates-2025.4.26-hbd8a1cb_0 2025-05-07T19:43:25.0422091Z conda pkgs/main::conda-25.3.1-py313h06a4308~ --> conda-forge::conda-25.3.1-py313h78bf25f_1 2025-05-07T19:43:25.0422756Z 2025-05-07T19:43:25.0422991Z The following packages will be SUPERSEDED by a higher-priority channel: 2025-05-07T19:43:25.0423330Z 2025-05-07T19:43:25.0423686Z certifi pkgs/main/linux-64::certifi-2025.4.26~ --> conda-forge/noarch::certifi-2025.4.26-pyhd8ed1ab_0 2025-05-07T19:43:25.0424537Z conda-libmamba-so~ pkgs/main::conda-libmamba-solver-25.4~ --> conda-forge::conda-libmamba-solver-25.4.0-pyhd8ed1ab_0 2025-05-07T19:43:25.0425070Z 2025-05-07T19:43:25.0425075Z 2025-05-07T19:43:25.0425078Z 2025-05-07T19:43:25.0425229Z Downloading and Extracting Packages: ...working... 2025-05-07T19:43:25.0425626Z conda-25.3.1 | 1.1 MB | | 0% 2025-05-07T19:43:25.0425864Z 2025-05-07T19:43:25.0426257Z certifi-2025.4.26 | 154 KB | | 0%  2025-05-07T19:43:25.0426528Z 2025-05-07T19:43:25.0426532Z 2025-05-07T19:43:25.0441407Z ca-certificates-2025 | 149 KB | | 0%  2025-05-07T19:43:25.0441753Z 2025-05-07T19:43:25.0442015Z 2025-05-07T19:43:25.0442292Z 2025-05-07T19:43:25.0959747Z conda-libmamba-solve | 41 KB | | 0%  2025-05-07T19:43:25.0960063Z 2025-05-07T19:43:25.0960236Z 2025-05-07T19:43:25.1064045Z ca-certificates-2025 | 149 KB | ########## | 100%  2025-05-07T19:43:25.1064901Z 2025-05-07T19:43:25.1064915Z 2025-05-07T19:43:25.1064927Z 2025-05-07T19:43:25.1068028Z conda-libmamba-solve | 41 KB | ########## | 100%  2025-05-07T19:43:25.1068888Z 2025-05-07T19:43:25.1070872Z 2025-05-07T19:43:25.1144603Z ca-certificates-2025 | 149 KB | ########## | 100%  2025-05-07T19:43:25.1144927Z 2025-05-07T19:43:25.1300888Z certifi-2025.4.26 | 154 KB | ########## | 100%  2025-05-07T19:43:25.1301187Z 2025-05-07T19:43:25.1301191Z 2025-05-07T19:43:25.1301196Z 2025-05-07T19:43:25.1332657Z conda-libmamba-solve | 41 KB | ########## | 100%  2025-05-07T19:43:25.1333139Z 2025-05-07T19:43:25.1358383Z certifi-2025.4.26 | 154 KB | ########## | 100%  2025-05-07T19:43:25.2304962Z conda-25.3.1 | 1.1 MB | ########## | 100% 2025-05-07T19:43:25.2306121Z conda-25.3.1 | 1.1 MB | ########## | 100% 2025-05-07T19:43:25.2318420Z conda-25.3.1 | 1.1 MB | ########## | 100% 2025-05-07T19:43:25.2319247Z 2025-05-07T19:43:25.2319541Z 2025-05-07T19:43:25.2319806Z  2025-05-07T19:43:25.2320045Z 2025-05-07T19:43:25.2320050Z 2025-05-07T19:43:25.2320223Z  2025-05-07T19:43:25.2320439Z 2025-05-07T19:43:25.2320444Z 2025-05-07T19:43:25.2320447Z 2025-05-07T19:43:25.2320644Z  done 2025-05-07T19:43:25.3326077Z Preparing transaction: \ done 2025-05-07T19:43:25.4335200Z Verifying transaction: / done 2025-05-07T19:43:26.7366562Z Executing transaction: \ | / - \ | / - \ | / - \ done 2025-05-07T19:43:28.3039527Z [SETUP] Updating Miniconda base packages ... 2025-05-07T19:43:28.3060464Z [EXEC] [ATTEMPT 0/3] + conda update -n base -c defaults --update-deps -y conda 2025-05-07T19:43:29.0356492Z Channels: 2025-05-07T19:43:29.0356777Z - defaults 2025-05-07T19:43:29.0356995Z Platform: linux-64 2025-05-07T19:43:30.1042694Z Collecting package metadata (repodata.json): - \ | / - \ done 2025-05-07T19:43:30.2325012Z Solving environment: / - Channels: 2025-05-07T19:43:30.2325409Z - defaults 2025-05-07T19:43:30.2325712Z Platform: linux-64 2025-05-07T19:43:30.5715469Z Collecting package metadata (repodata.json): | / - \ | / done 2025-05-07T19:43:30.7942764Z Solving environment: \ | / - done 2025-05-07T19:43:30.8897450Z done 2025-05-07T19:43:30.9526731Z 2025-05-07T19:43:30.9527007Z ## Package Plan ## 2025-05-07T19:43:30.9527197Z 2025-05-07T19:43:30.9527412Z environment location: /github/home/miniconda 2025-05-07T19:43:30.9527679Z 2025-05-07T19:43:30.9527779Z added / updated specs: 2025-05-07T19:43:30.9528456Z - conda 2025-05-07T19:43:30.9528581Z 2025-05-07T19:43:30.9528585Z 2025-05-07T19:43:30.9528708Z The following packages will be downloaded: 2025-05-07T19:43:30.9528949Z 2025-05-07T19:43:30.9529067Z package | build 2025-05-07T19:43:30.9529411Z ---------------------------|----------------- 2025-05-07T19:43:30.9529764Z pip-25.1 | pyhc872135_2 1.3 MB 2025-05-07T19:43:30.9530173Z tzdata-2025b | h04d1e81_0 116 KB 2025-05-07T19:43:30.9530556Z ------------------------------------------------------------ 2025-05-07T19:43:30.9530914Z Total: 1.4 MB 2025-05-07T19:43:30.9531131Z 2025-05-07T19:43:30.9531246Z The following packages will be UPDATED: 2025-05-07T19:43:30.9531477Z 2025-05-07T19:43:30.9531797Z pip pkgs/main/linux-64::pip-25.0-py313h06~ --> pkgs/main/noarch::pip-25.1-pyhc872135_2 2025-05-07T19:43:30.9532528Z tzdata 2025a-h04d1e81_0 --> 2025b-h04d1e81_0 2025-05-07T19:43:30.9532795Z 2025-05-07T19:43:30.9532799Z 2025-05-07T19:43:30.9532803Z 2025-05-07T19:43:30.9532949Z Downloading and Extracting Packages: ...working... 2025-05-07T19:43:30.9533334Z pip-25.1 | 1.3 MB | | 0% 2025-05-07T19:43:30.9533566Z 2025-05-07T19:43:31.0052167Z tzdata-2025b | 116 KB | | 0%  2025-05-07T19:43:31.0052962Z 2025-05-07T19:43:31.0098172Z tzdata-2025b | 116 KB | ########## | 100%  2025-05-07T19:43:31.2057291Z pip-25.1 | 1.3 MB | ########## | 100% 2025-05-07T19:43:31.2059546Z pip-25.1 | 1.3 MB | ########## | 100% 2025-05-07T19:43:31.2077169Z pip-25.1 | 1.3 MB | ########## | 100% 2025-05-07T19:43:31.2077757Z 2025-05-07T19:43:31.2078340Z tzdata-2025b | 116 KB | ########## | 100%  2025-05-07T19:43:31.2078653Z 2025-05-07T19:43:31.2080049Z tzdata-2025b | 116 KB | ########## | 100%  2025-05-07T19:43:31.2080433Z 2025-05-07T19:43:31.2080648Z 2025-05-07T19:43:31.2083821Z  done 2025-05-07T19:43:31.3088902Z Preparing transaction: | done 2025-05-07T19:43:31.4099221Z Verifying transaction: - done 2025-05-07T19:43:33.4147501Z Executing transaction: | / - \ | / - \ | / - \ | / - \ | / - \ done 2025-05-07T19:43:33.9561408Z [SETUP] Cleaning up Conda packages ... 2025-05-07T19:43:33.9567152Z + conda clean --packages --tarball -y 2025-05-07T19:43:33.9567391Z 2025-05-07T19:43:34.4056449Z Will remove 99 (117.8 MB) tarball(s). 2025-05-07T19:43:34.4057411Z Will remove 11 (16.0 MB) package(s). 2025-05-07T19:43:34.4616493Z 2025-05-07T19:43:34.4631163Z + conda clean --all -y 2025-05-07T19:43:34.4631539Z 2025-05-07T19:43:34.9189447Z There are no unused tarball(s) to remove. 2025-05-07T19:43:34.9189890Z Will remove 1 index cache(s). 2025-05-07T19:43:34.9190213Z There are no unused package(s) to remove. 2025-05-07T19:43:34.9190551Z There are no tempfile(s) to remove. 2025-05-07T19:43:34.9190852Z There are no logfile(s) to remove. 2025-05-07T19:43:34.9754952Z 2025-05-07T19:43:34.9755940Z + conda info 2025-05-07T19:43:34.9756118Z 2025-05-07T19:43:35.5360510Z 2025-05-07T19:43:35.5360939Z active environment : base 2025-05-07T19:43:35.5361308Z active env location : /github/home/miniconda 2025-05-07T19:43:35.5361784Z shell level : 1 2025-05-07T19:43:35.5362065Z user config file : /github/home/.condarc 2025-05-07T19:43:35.5362473Z populated config files : /github/home/miniconda/.condarc 2025-05-07T19:43:35.5362851Z conda version : 25.3.1 2025-05-07T19:43:35.5363153Z conda-build version : not installed 2025-05-07T19:43:35.5363469Z python version : 3.13.2.final.0 2025-05-07T19:43:35.5363769Z solver : libmamba (default) 2025-05-07T19:43:35.5364139Z virtual packages : __archspec=1=cascadelake 2025-05-07T19:43:35.5364790Z __conda=25.3.1=0 2025-05-07T19:43:35.5365110Z __glibc=2.34=0 2025-05-07T19:43:35.5365396Z __linux=6.1.130=0 2025-05-07T19:43:35.5365694Z __unix=0=0 2025-05-07T19:43:35.5366045Z base environment : /github/home/miniconda (writable) 2025-05-07T19:43:35.5366452Z conda av data dir : /github/home/miniconda/etc/conda 2025-05-07T19:43:35.5366813Z conda av metadata url : None 2025-05-07T19:43:35.5367188Z channel URLs : https://repo.anaconda.com/pkgs/main/linux-64 2025-05-07T19:43:35.5367642Z https://repo.anaconda.com/pkgs/main/noarch 2025-05-07T19:43:35.5368033Z https://repo.anaconda.com/pkgs/r/linux-64 2025-05-07T19:43:35.5368433Z https://repo.anaconda.com/pkgs/r/noarch 2025-05-07T19:43:35.5368819Z package cache : /github/home/miniconda/pkgs 2025-05-07T19:43:35.5369317Z /github/home/.conda/pkgs 2025-05-07T19:43:35.5369693Z envs directories : /github/home/miniconda/envs 2025-05-07T19:43:35.5370037Z /github/home/.conda/envs 2025-05-07T19:43:35.5370369Z platform : linux-64 2025-05-07T19:43:35.5371268Z user-agent : conda/25.3.1 requests/2.32.3 CPython/3.13.2 Linux/6.1.130-139.222.amzn2023.x86_64 amzn/2023.7.20250428 glibc/2.34 solver/libmamba conda-libmamba-solver/25.4.0 libmambapy/2.0.5 aau/0.7.0 c/. s/. e/. 2025-05-07T19:43:35.5372179Z UID:GID : 0:0 2025-05-07T19:43:35.5372451Z netrc file : None 2025-05-07T19:43:35.5372713Z offline mode : False 2025-05-07T19:43:35.5372883Z 2025-05-07T19:43:35.5947191Z 2025-05-07T19:43:35.5947641Z [SETUP] Exporting Miniconda variables ... 2025-05-07T19:43:35.5948353Z [SETUP] Saving Miniconda variables to /__w/_temp/_runner_file_commands/add_path_edb5f5c8-419e-4dbd-83af-c52c10ae2e85 ... 2025-05-07T19:43:35.5949131Z [SETUP] Successfully set up Miniconda at /github/home/miniconda 2025-05-07T19:43:35.6115821Z ##[group]Run . $PRELUDE; create_conda_environment $BUILD_ENV 3.13 2025-05-07T19:43:35.6116410Z . $PRELUDE; create_conda_environment $BUILD_ENV 3.13 2025-05-07T19:43:35.6117285Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:43:35.6117656Z env: 2025-05-07T19:43:35.6117903Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:43:35.6118246Z BUILD_ENV: build_binary 2025-05-07T19:43:35.6118506Z BUILD_TARGET: genai 2025-05-07T19:43:35.6118778Z BUILD_VARIANT: cuda 2025-05-07T19:43:35.6119029Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:43:35.6119313Z ##[endgroup] 2025-05-07T19:43:36.0514343Z ################################################################################ 2025-05-07T19:43:36.0515260Z # Create Conda Environment 2025-05-07T19:43:36.0515612Z # 2025-05-07T19:43:36.0537291Z # [2025-05-07T19:43:36.053Z] + create_conda_environment build_binary 3.13 2025-05-07T19:43:36.0537871Z ################################################################################ 2025-05-07T19:43:36.0538109Z 2025-05-07T19:43:36.0553363Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:43:36.1452256Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:43:36.1453029Z [SETUP] Listing existing Conda environments ... 2025-05-07T19:43:36.1453396Z + conda info --envs 2025-05-07T19:43:36.1453541Z 2025-05-07T19:43:36.7047234Z 2025-05-07T19:43:36.7047709Z # conda environments: 2025-05-07T19:43:36.7048431Z # 2025-05-07T19:43:36.7049061Z base /github/home/miniconda 2025-05-07T19:43:36.7049705Z 2025-05-07T19:43:36.7628824Z 2025-05-07T19:43:36.7629565Z [SETUP] Deleting the prefix directory if it exists ... 2025-05-07T19:43:38.3783533Z + rm -rf /github/home/miniconda/envs/build_binary 2025-05-07T19:43:38.3784333Z 2025-05-07T19:43:38.3802083Z 2025-05-07T19:43:38.3811175Z [SETUP] Creating new Conda environment (Python 3.13) ... 2025-05-07T19:43:38.3834136Z [EXEC] [ATTEMPT 0/3] + conda create -y -n build_binary python=3.13 2025-05-07T19:43:38.9457483Z Channels: 2025-05-07T19:43:38.9458154Z - defaults 2025-05-07T19:43:38.9458776Z Platform: linux-64 2025-05-07T19:43:40.3287022Z Collecting package metadata (repodata.json): - \ | / - \ | / - done 2025-05-07T19:43:40.4293916Z Solving environment: | done 2025-05-07T19:43:40.4581228Z 2025-05-07T19:43:40.4581684Z ## Package Plan ## 2025-05-07T19:43:40.4582171Z 2025-05-07T19:43:40.4582767Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:43:40.4583735Z 2025-05-07T19:43:40.4583858Z added / updated specs: 2025-05-07T19:43:40.4584117Z - python=3.13 2025-05-07T19:43:40.4584256Z 2025-05-07T19:43:40.4584261Z 2025-05-07T19:43:40.4584407Z The following packages will be downloaded: 2025-05-07T19:43:40.4584676Z 2025-05-07T19:43:40.4584799Z package | build 2025-05-07T19:43:40.4585156Z ---------------------------|----------------- 2025-05-07T19:43:40.4585595Z _libgcc_mutex-0.1 | main 3 KB 2025-05-07T19:43:40.4586018Z _openmp_mutex-5.1 | 1_gnu 21 KB 2025-05-07T19:43:40.4586474Z ca-certificates-2025.2.25 | h06a4308_0 129 KB 2025-05-07T19:43:40.4586908Z python_abi-3.13 | 0_cp313 6 KB 2025-05-07T19:43:40.4587318Z ------------------------------------------------------------ 2025-05-07T19:43:40.4587710Z Total: 159 KB 2025-05-07T19:43:40.4587947Z 2025-05-07T19:43:40.4588085Z The following NEW packages will be INSTALLED: 2025-05-07T19:43:40.4588325Z 2025-05-07T19:43:40.4588567Z _libgcc_mutex pkgs/main/linux-64::_libgcc_mutex-0.1-main 2025-05-07T19:43:40.4589034Z _openmp_mutex pkgs/main/linux-64::_openmp_mutex-5.1-1_gnu 2025-05-07T19:43:40.4589490Z bzip2 pkgs/main/linux-64::bzip2-1.0.8-h5eee18b_6 2025-05-07T19:43:40.4590359Z ca-certificates pkgs/main/linux-64::ca-certificates-2025.2.25-h06a4308_0 2025-05-07T19:43:40.4590903Z expat pkgs/main/linux-64::expat-2.7.1-h6a678d5_0 2025-05-07T19:43:40.4591395Z ld_impl_linux-64 pkgs/main/linux-64::ld_impl_linux-64-2.40-h12ee557_0 2025-05-07T19:43:40.4591880Z libffi pkgs/main/linux-64::libffi-3.4.4-h6a678d5_1 2025-05-07T19:43:40.4592347Z libgcc-ng pkgs/main/linux-64::libgcc-ng-11.2.0-h1234567_1 2025-05-07T19:43:40.4592807Z libgomp pkgs/main/linux-64::libgomp-11.2.0-h1234567_1 2025-05-07T19:43:40.4593279Z libmpdec pkgs/main/linux-64::libmpdec-4.0.0-h5eee18b_0 2025-05-07T19:43:40.4593779Z libstdcxx-ng pkgs/main/linux-64::libstdcxx-ng-11.2.0-h1234567_1 2025-05-07T19:43:40.4594257Z libuuid pkgs/main/linux-64::libuuid-1.41.5-h5eee18b_0 2025-05-07T19:43:40.4594711Z ncurses pkgs/main/linux-64::ncurses-6.4-h6a678d5_0 2025-05-07T19:43:40.4595149Z openssl pkgs/main/linux-64::openssl-3.0.16-h5eee18b_0 2025-05-07T19:43:40.4595595Z pip pkgs/main/noarch::pip-25.1-pyhc872135_2 2025-05-07T19:43:40.4596047Z python pkgs/main/linux-64::python-3.13.2-hf623796_100_cp313 2025-05-07T19:43:40.4596512Z python_abi pkgs/main/linux-64::python_abi-3.13-0_cp313 2025-05-07T19:43:40.4596971Z readline pkgs/main/linux-64::readline-8.2-h5eee18b_0 2025-05-07T19:43:40.4597577Z setuptools pkgs/main/linux-64::setuptools-78.1.1-py313h06a4308_0 2025-05-07T19:43:40.4598170Z sqlite pkgs/main/linux-64::sqlite-3.45.3-h5eee18b_0 2025-05-07T19:43:40.4598562Z tk pkgs/main/linux-64::tk-8.6.14-h39e8969_0 2025-05-07T19:43:40.4598934Z tzdata pkgs/main/noarch::tzdata-2025b-h04d1e81_0 2025-05-07T19:43:40.4599350Z wheel pkgs/main/linux-64::wheel-0.45.1-py313h06a4308_0 2025-05-07T19:43:40.4599730Z xz pkgs/main/linux-64::xz-5.6.4-h5eee18b_1 2025-05-07T19:43:40.4600249Z zlib pkgs/main/linux-64::zlib-1.2.13-h5eee18b_1 2025-05-07T19:43:40.4600489Z 2025-05-07T19:43:40.4600493Z 2025-05-07T19:43:40.4600497Z 2025-05-07T19:43:40.4600659Z Downloading and Extracting Packages: ...working... 2025-05-07T19:43:40.4601030Z ca-certificates-2025 | 129 KB | | 0% 2025-05-07T19:43:40.4601262Z 2025-05-07T19:43:40.4601609Z _openmp_mutex-5.1 | 21 KB | | 0%  2025-05-07T19:43:40.4602028Z 2025-05-07T19:43:40.4602033Z 2025-05-07T19:43:40.4611741Z python_abi-3.13 | 6 KB | | 0%  2025-05-07T19:43:40.4612067Z 2025-05-07T19:43:40.4612072Z 2025-05-07T19:43:40.4612076Z 2025-05-07T19:43:40.4945897Z _libgcc_mutex-0.1 | 3 KB | | 0%  2025-05-07T19:43:40.4946740Z 2025-05-07T19:43:40.4946755Z 2025-05-07T19:43:40.5012206Z python_abi-3.13 | 6 KB | ########## | 100%  2025-05-07T19:43:40.5013043Z 2025-05-07T19:43:40.5013697Z _openmp_mutex-5.1 | 21 KB | ########## | 100%  2025-05-07T19:43:40.5019741Z ca-certificates-2025 | 129 KB | ########## | 100% 2025-05-07T19:43:40.5020505Z 2025-05-07T19:43:40.5020518Z 2025-05-07T19:43:40.5123715Z python_abi-3.13 | 6 KB | ########## | 100%  2025-05-07T19:43:40.5159616Z ca-certificates-2025 | 129 KB | ########## | 100% 2025-05-07T19:43:40.5159936Z 2025-05-07T19:43:40.5159941Z 2025-05-07T19:43:40.5159945Z 2025-05-07T19:43:40.5217668Z _libgcc_mutex-0.1 | 3 KB | ########## | 100%  2025-05-07T19:43:40.5218533Z 2025-05-07T19:43:40.5218563Z 2025-05-07T19:43:40.5218575Z 2025-05-07T19:43:40.5225762Z _libgcc_mutex-0.1 | 3 KB | ########## | 100%  2025-05-07T19:43:40.5226618Z 2025-05-07T19:43:40.5228079Z _openmp_mutex-5.1 | 21 KB | ########## | 100%  2025-05-07T19:43:40.5229183Z 2025-05-07T19:43:40.5229588Z 2025-05-07T19:43:40.5229775Z  2025-05-07T19:43:40.5230019Z 2025-05-07T19:43:40.5230043Z 2025-05-07T19:43:40.5230513Z  2025-05-07T19:43:40.5230750Z 2025-05-07T19:43:40.5230755Z 2025-05-07T19:43:40.5230760Z 2025-05-07T19:43:40.5230982Z  done 2025-05-07T19:43:40.7291056Z Preparing transaction: - \ done 2025-05-07T19:43:42.2819098Z Verifying transaction: / - \ | / - \ | / - \ | / - done 2025-05-07T19:43:44.4972663Z Executing transaction: | / - \ | / - \ | / - \ | / - \ | / - \ | / done 2025-05-07T19:43:44.5010886Z # 2025-05-07T19:43:44.5011622Z # To activate this environment, use 2025-05-07T19:43:44.5012278Z # 2025-05-07T19:43:44.5012510Z # $ conda activate build_binary 2025-05-07T19:43:44.5012792Z # 2025-05-07T19:43:44.5013040Z # To deactivate an active environment, use 2025-05-07T19:43:44.5013340Z # 2025-05-07T19:43:44.5013571Z # $ conda deactivate 2025-05-07T19:43:44.5013753Z 2025-05-07T19:43:44.5880142Z [SETUP] Upgrading PIP to latest ... 2025-05-07T19:43:44.5912851Z [EXEC] [ATTEMPT 0/3] + conda run -n build_binary pip install --upgrade pip 2025-05-07T19:43:47.5515515Z WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-05-07T19:43:47.5517195Z 2025-05-07T19:43:47.5517629Z Requirement already satisfied: pip in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (25.1) 2025-05-07T19:43:47.5518271Z Collecting pip 2025-05-07T19:43:47.5518593Z Downloading pip-25.1.1-py3-none-any.whl.metadata (3.6 kB) 2025-05-07T19:43:47.5519035Z Downloading pip-25.1.1-py3-none-any.whl (1.8 MB) 2025-05-07T19:43:47.5520412Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 70.0 MB/s eta 0:00:00 2025-05-07T19:43:47.5520818Z Installing collected packages: pip 2025-05-07T19:43:47.5521111Z Attempting uninstall: pip 2025-05-07T19:43:47.5521397Z Found existing installation: pip 25.1 2025-05-07T19:43:47.5521698Z Uninstalling pip-25.1: 2025-05-07T19:43:47.5521982Z Successfully uninstalled pip-25.1 2025-05-07T19:43:47.5522279Z Successfully installed pip-25.1.1 2025-05-07T19:43:47.5522483Z 2025-05-07T19:43:47.6323270Z [SETUP] Upgrading pyOpenSSL ... 2025-05-07T19:43:47.6348302Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y pyOpenSSL>22.1.0 2025-05-07T19:43:48.2912418Z Channels: 2025-05-07T19:43:48.2913109Z - conda-forge 2025-05-07T19:43:48.2913745Z Platform: linux-64 2025-05-07T19:43:58.0675546Z Collecting package metadata (repodata.json): - \ | / - \ | / - \ | / - \ | / - done 2025-05-07T19:43:59.9670106Z Solving environment: | / - \ | done 2025-05-07T19:44:00.0156259Z 2025-05-07T19:44:00.0156933Z ## Package Plan ## 2025-05-07T19:44:00.0157523Z 2025-05-07T19:44:00.0158110Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:44:00.0159030Z 2025-05-07T19:44:00.0159334Z added / updated specs: 2025-05-07T19:44:00.0160097Z - pyopenssl[version='>22.1.0'] 2025-05-07T19:44:00.0160659Z 2025-05-07T19:44:00.0160672Z 2025-05-07T19:44:00.0161043Z The following packages will be downloaded: 2025-05-07T19:44:00.0161687Z 2025-05-07T19:44:00.0162023Z package | build 2025-05-07T19:44:00.0162985Z ---------------------------|----------------- 2025-05-07T19:44:00.0164054Z cffi-1.17.1 | py313hfab6e84_0 289 KB conda-forge 2025-05-07T19:44:00.0165406Z cryptography-44.0.3 | py313h6556f6e_0 1.5 MB conda-forge 2025-05-07T19:44:00.0166113Z libgcc-15.1.0 | h767d61c_2 810 KB conda-forge 2025-05-07T19:44:00.0166557Z libgcc-ng-15.1.0 | h69a702a_2 34 KB conda-forge 2025-05-07T19:44:00.0167320Z libgomp-15.1.0 | h767d61c_2 442 KB conda-forge 2025-05-07T19:44:00.0167762Z openssl-3.5.0 | h7b32b05_1 3.0 MB conda-forge 2025-05-07T19:44:00.0168225Z pycparser-2.22 | pyh29332c3_1 108 KB conda-forge 2025-05-07T19:44:00.0168809Z pyopenssl-25.0.0 | pyhd8ed1ab_0 120 KB conda-forge 2025-05-07T19:44:00.0169408Z typing-extensions-4.13.2 | h0e9735f_0 88 KB conda-forge 2025-05-07T19:44:00.0169897Z typing_extensions-4.13.2 | pyh29332c3_0 51 KB conda-forge 2025-05-07T19:44:00.0170311Z ------------------------------------------------------------ 2025-05-07T19:44:00.0170668Z Total: 6.4 MB 2025-05-07T19:44:00.0171059Z 2025-05-07T19:44:00.0171197Z The following NEW packages will be INSTALLED: 2025-05-07T19:44:00.0171453Z 2025-05-07T19:44:00.0171835Z cffi conda-forge/linux-64::cffi-1.17.1-py313hfab6e84_0 2025-05-07T19:44:00.0172430Z cryptography conda-forge/linux-64::cryptography-44.0.3-py313h6556f6e_0 2025-05-07T19:44:00.0172947Z libgcc conda-forge/linux-64::libgcc-15.1.0-h767d61c_2 2025-05-07T19:44:00.0176036Z pycparser conda-forge/noarch::pycparser-2.22-pyh29332c3_1 2025-05-07T19:44:00.0176606Z pyopenssl conda-forge/noarch::pyopenssl-25.0.0-pyhd8ed1ab_0 2025-05-07T19:44:00.0177187Z typing-extensions conda-forge/noarch::typing-extensions-4.13.2-h0e9735f_0 2025-05-07T19:44:00.0177824Z typing_extensions conda-forge/noarch::typing_extensions-4.13.2-pyh29332c3_0 2025-05-07T19:44:00.0178187Z 2025-05-07T19:44:00.0178308Z The following packages will be UPDATED: 2025-05-07T19:44:00.0178544Z 2025-05-07T19:44:00.0178962Z ca-certificates pkgs/main/linux-64::ca-certificates-2~ --> conda-forge/noarch::ca-certificates-2025.4.26-hbd8a1cb_0 2025-05-07T19:44:00.0179994Z libgcc-ng pkgs/main::libgcc-ng-11.2.0-h1234567_1 --> conda-forge::libgcc-ng-15.1.0-h69a702a_2 2025-05-07T19:44:00.0180710Z libgomp pkgs/main::libgomp-11.2.0-h1234567_1 --> conda-forge::libgomp-15.1.0-h767d61c_2 2025-05-07T19:44:00.0181407Z openssl pkgs/main::openssl-3.0.16-h5eee18b_0 --> conda-forge::openssl-3.5.0-h7b32b05_1 2025-05-07T19:44:00.0181800Z 2025-05-07T19:44:00.0181804Z 2025-05-07T19:44:00.0181808Z 2025-05-07T19:44:00.0181959Z Downloading and Extracting Packages: ...working... 2025-05-07T19:44:00.0182659Z openssl-3.5.0 | 3.0 MB | | 0% 2025-05-07T19:44:00.0182903Z 2025-05-07T19:44:00.0183277Z cryptography-44.0.3 | 1.5 MB | | 0%  2025-05-07T19:44:00.0183545Z 2025-05-07T19:44:00.0183549Z 2025-05-07T19:44:00.0183761Z libgcc-15.1.0 | 810 KB | | 0%  2025-05-07T19:44:00.0184038Z 2025-05-07T19:44:00.0184042Z 2025-05-07T19:44:00.0184046Z 2025-05-07T19:44:00.0184275Z libgomp-15.1.0 | 442 KB | | 0%  2025-05-07T19:44:00.0184560Z 2025-05-07T19:44:00.0184564Z 2025-05-07T19:44:00.0184568Z 2025-05-07T19:44:00.0195676Z 2025-05-07T19:44:00.0196653Z cffi-1.17.1 | 289 KB | | 0%  2025-05-07T19:44:00.0197498Z 2025-05-07T19:44:00.0197510Z 2025-05-07T19:44:00.0197523Z 2025-05-07T19:44:00.0197533Z 2025-05-07T19:44:00.0197582Z 2025-05-07T19:44:00.0198340Z pyopenssl-25.0.0 | 120 KB | | 0%  2025-05-07T19:44:00.0199198Z 2025-05-07T19:44:00.0199208Z 2025-05-07T19:44:00.0199219Z 2025-05-07T19:44:00.0199230Z 2025-05-07T19:44:00.0199240Z 2025-05-07T19:44:00.0199250Z 2025-05-07T19:44:00.0199963Z pycparser-2.22 | 108 KB | | 0%  2025-05-07T19:44:00.0200783Z 2025-05-07T19:44:00.0200794Z 2025-05-07T19:44:00.0200805Z 2025-05-07T19:44:00.0200839Z 2025-05-07T19:44:00.0200849Z 2025-05-07T19:44:00.0200860Z 2025-05-07T19:44:00.0200904Z 2025-05-07T19:44:00.0202158Z typing-extensions-4. | 88 KB | | 0%  2025-05-07T19:44:00.0202679Z 2025-05-07T19:44:00.0202683Z 2025-05-07T19:44:00.0202686Z 2025-05-07T19:44:00.0202690Z 2025-05-07T19:44:00.0202714Z 2025-05-07T19:44:00.0202717Z 2025-05-07T19:44:00.0202721Z 2025-05-07T19:44:00.0202724Z 2025-05-07T19:44:00.0203016Z typing_extensions-4. | 51 KB | | 0%  2025-05-07T19:44:00.0203324Z 2025-05-07T19:44:00.0203327Z 2025-05-07T19:44:00.0203331Z 2025-05-07T19:44:00.0203334Z 2025-05-07T19:44:00.0203338Z 2025-05-07T19:44:00.0203341Z 2025-05-07T19:44:00.0203363Z 2025-05-07T19:44:00.0203367Z 2025-05-07T19:44:00.0203370Z 2025-05-07T19:44:00.0983997Z libgcc-ng-15.1.0 | 34 KB | | 0%  2025-05-07T19:44:00.0984908Z 2025-05-07T19:44:00.0984923Z 2025-05-07T19:44:00.0984935Z 2025-05-07T19:44:00.1098353Z libgomp-15.1.0 | 442 KB | ########## | 100%  2025-05-07T19:44:00.1099230Z 2025-05-07T19:44:00.1166456Z cryptography-44.0.3 | 1.5 MB | ########## | 100%  2025-05-07T19:44:00.1166788Z 2025-05-07T19:44:00.1166793Z 2025-05-07T19:44:00.1241405Z libgcc-15.1.0 | 810 KB | ##9 | 30%  2025-05-07T19:44:00.1241842Z openssl-3.5.0 | 3.0 MB | ########## | 100% 2025-05-07T19:44:00.1302401Z openssl-3.5.0 | 3.0 MB | ########## | 100% 2025-05-07T19:44:00.1302871Z 2025-05-07T19:44:00.1359736Z 2025-05-07T19:44:00.1360736Z libgcc-15.1.0 | 810 KB | ########## | 100%  2025-05-07T19:44:00.1361557Z 2025-05-07T19:44:00.1361569Z 2025-05-07T19:44:00.1361580Z 2025-05-07T19:44:00.1361590Z 2025-05-07T19:44:00.1361600Z 2025-05-07T19:44:00.1363654Z pyopenssl-25.0.0 | 120 KB | #3 | 13%  2025-05-07T19:44:00.1364497Z 2025-05-07T19:44:00.1364509Z 2025-05-07T19:44:00.1365135Z 2025-05-07T19:44:00.1370225Z libgomp-15.1.0 | 442 KB | ########## | 100%  2025-05-07T19:44:00.1370843Z 2025-05-07T19:44:00.1370863Z 2025-05-07T19:44:00.1370867Z 2025-05-07T19:44:00.1394834Z libgomp-15.1.0 | 442 KB | ########## | 100%  2025-05-07T19:44:00.1395330Z 2025-05-07T19:44:00.1395335Z 2025-05-07T19:44:00.1395339Z 2025-05-07T19:44:00.1395343Z 2025-05-07T19:44:00.1395347Z 2025-05-07T19:44:00.1457667Z pyopenssl-25.0.0 | 120 KB | ########## | 100%  2025-05-07T19:44:00.1458029Z 2025-05-07T19:44:00.1458034Z 2025-05-07T19:44:00.1458038Z 2025-05-07T19:44:00.1458042Z 2025-05-07T19:44:00.1458046Z 2025-05-07T19:44:00.1458050Z 2025-05-07T19:44:00.1486580Z pycparser-2.22 | 108 KB | #4 | 15%  2025-05-07T19:44:00.1487499Z 2025-05-07T19:44:00.1487513Z 2025-05-07T19:44:00.1487525Z 2025-05-07T19:44:00.1487536Z 2025-05-07T19:44:00.1487547Z 2025-05-07T19:44:00.1487576Z 2025-05-07T19:44:00.1708322Z pycparser-2.22 | 108 KB | ########## | 100%  2025-05-07T19:44:00.1709240Z 2025-05-07T19:44:00.1709321Z 2025-05-07T19:44:00.1709333Z 2025-05-07T19:44:00.1709344Z 2025-05-07T19:44:00.1709372Z 2025-05-07T19:44:00.1709383Z 2025-05-07T19:44:00.1709394Z 2025-05-07T19:44:00.1724785Z typing-extensions-4. | 88 KB | #8 | 18%  2025-05-07T19:44:00.1725312Z 2025-05-07T19:44:00.1725358Z 2025-05-07T19:44:00.1725362Z 2025-05-07T19:44:00.1725366Z 2025-05-07T19:44:00.1725370Z 2025-05-07T19:44:00.1725373Z 2025-05-07T19:44:00.1725377Z 2025-05-07T19:44:00.1739409Z typing-extensions-4. | 88 KB | ########## | 100%  2025-05-07T19:44:00.1740383Z 2025-05-07T19:44:00.1740419Z 2025-05-07T19:44:00.1740431Z 2025-05-07T19:44:00.1740441Z 2025-05-07T19:44:00.1740452Z 2025-05-07T19:44:00.1740462Z 2025-05-07T19:44:00.1740473Z 2025-05-07T19:44:00.1740483Z 2025-05-07T19:44:00.1758863Z typing_extensions-4. | 51 KB | ###1 | 31%  2025-05-07T19:44:00.1759845Z 2025-05-07T19:44:00.1759885Z 2025-05-07T19:44:00.1759931Z 2025-05-07T19:44:00.1759943Z 2025-05-07T19:44:00.1759953Z 2025-05-07T19:44:00.1759963Z 2025-05-07T19:44:00.1760404Z 2025-05-07T19:44:00.1760417Z 2025-05-07T19:44:00.1847262Z typing_extensions-4. | 51 KB | ########## | 100%  2025-05-07T19:44:00.1848238Z 2025-05-07T19:44:00.1848253Z 2025-05-07T19:44:00.1848289Z 2025-05-07T19:44:00.1848300Z 2025-05-07T19:44:00.1860621Z cffi-1.17.1 | 289 KB | 5 | 6%  2025-05-07T19:44:00.1861547Z 2025-05-07T19:44:00.1861551Z 2025-05-07T19:44:00.1861555Z 2025-05-07T19:44:00.1861559Z 2025-05-07T19:44:00.1861562Z 2025-05-07T19:44:00.1861567Z 2025-05-07T19:44:00.1861588Z 2025-05-07T19:44:00.1861592Z 2025-05-07T19:44:00.1861595Z 2025-05-07T19:44:00.1877756Z libgcc-ng-15.1.0 | 34 KB | ####7 | 47%  2025-05-07T19:44:00.1878108Z 2025-05-07T19:44:00.1878113Z 2025-05-07T19:44:00.1878117Z 2025-05-07T19:44:00.1878121Z 2025-05-07T19:44:00.1878124Z 2025-05-07T19:44:00.1878165Z 2025-05-07T19:44:00.1878169Z 2025-05-07T19:44:00.1878173Z 2025-05-07T19:44:00.1878298Z 2025-05-07T19:44:00.1953345Z libgcc-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:00.1954258Z 2025-05-07T19:44:00.1954288Z 2025-05-07T19:44:00.2178952Z libgcc-15.1.0 | 810 KB | ########## | 100%  2025-05-07T19:44:00.2179753Z 2025-05-07T19:44:00.2179786Z 2025-05-07T19:44:00.2179798Z 2025-05-07T19:44:00.2179811Z 2025-05-07T19:44:00.2179822Z 2025-05-07T19:44:00.2243544Z pyopenssl-25.0.0 | 120 KB | ########## | 100%  2025-05-07T19:44:00.2244500Z 2025-05-07T19:44:00.2244506Z 2025-05-07T19:44:00.2244510Z 2025-05-07T19:44:00.2244514Z 2025-05-07T19:44:00.2957674Z cffi-1.17.1 | 289 KB | ########## | 100%  2025-05-07T19:44:00.2957989Z 2025-05-07T19:44:00.2957995Z 2025-05-07T19:44:00.2957999Z 2025-05-07T19:44:00.2958003Z 2025-05-07T19:44:00.2958007Z 2025-05-07T19:44:00.2958010Z 2025-05-07T19:44:00.2958618Z pycparser-2.22 | 108 KB | ########## | 100%  2025-05-07T19:44:00.2958990Z 2025-05-07T19:44:00.2958994Z 2025-05-07T19:44:00.2958998Z 2025-05-07T19:44:00.2959002Z 2025-05-07T19:44:00.2959005Z 2025-05-07T19:44:00.2959009Z 2025-05-07T19:44:00.3005958Z pycparser-2.22 | 108 KB | ########## | 100%  2025-05-07T19:44:00.3081745Z openssl-3.5.0 | 3.0 MB | ########## | 100% 2025-05-07T19:44:00.3082257Z 2025-05-07T19:44:00.3082284Z 2025-05-07T19:44:00.3082288Z 2025-05-07T19:44:00.3082465Z 2025-05-07T19:44:00.3082510Z 2025-05-07T19:44:00.3082524Z 2025-05-07T19:44:00.3082618Z 2025-05-07T19:44:00.3083287Z typing-extensions-4. | 88 KB | ########## | 100%  2025-05-07T19:44:00.3083631Z 2025-05-07T19:44:00.3083635Z 2025-05-07T19:44:00.3083638Z 2025-05-07T19:44:00.3083652Z 2025-05-07T19:44:00.3083656Z 2025-05-07T19:44:00.3083679Z 2025-05-07T19:44:00.3083683Z 2025-05-07T19:44:00.3149418Z typing-extensions-4. | 88 KB | ########## | 100%  2025-05-07T19:44:00.3150473Z 2025-05-07T19:44:00.3150502Z 2025-05-07T19:44:00.3150514Z 2025-05-07T19:44:00.3150525Z 2025-05-07T19:44:00.3150536Z 2025-05-07T19:44:00.3150569Z 2025-05-07T19:44:00.3150580Z 2025-05-07T19:44:00.3150590Z 2025-05-07T19:44:00.3151407Z typing_extensions-4. | 51 KB | ########## | 100%  2025-05-07T19:44:00.3152320Z 2025-05-07T19:44:00.3152331Z 2025-05-07T19:44:00.3152341Z 2025-05-07T19:44:00.3152353Z 2025-05-07T19:44:00.3152364Z 2025-05-07T19:44:00.3152374Z 2025-05-07T19:44:00.3152384Z 2025-05-07T19:44:00.3152414Z 2025-05-07T19:44:00.3422639Z typing_extensions-4. | 51 KB | ########## | 100%  2025-05-07T19:44:00.3423162Z 2025-05-07T19:44:00.3423166Z 2025-05-07T19:44:00.3423170Z 2025-05-07T19:44:00.3423174Z 2025-05-07T19:44:00.3423178Z 2025-05-07T19:44:00.3423182Z 2025-05-07T19:44:00.3423186Z 2025-05-07T19:44:00.3423209Z 2025-05-07T19:44:00.3423228Z 2025-05-07T19:44:00.3423514Z libgcc-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:00.3424058Z 2025-05-07T19:44:00.3424064Z 2025-05-07T19:44:00.3424068Z 2025-05-07T19:44:00.3424071Z 2025-05-07T19:44:00.3424075Z 2025-05-07T19:44:00.3424078Z 2025-05-07T19:44:00.3424082Z 2025-05-07T19:44:00.3424103Z 2025-05-07T19:44:00.3424107Z 2025-05-07T19:44:00.3460729Z libgcc-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:00.3461657Z 2025-05-07T19:44:00.3462369Z cryptography-44.0.3 | 1.5 MB | ########## | 100%  2025-05-07T19:44:00.3463150Z 2025-05-07T19:44:00.3523007Z cryptography-44.0.3 | 1.5 MB | ########## | 100%  2025-05-07T19:44:00.3523327Z 2025-05-07T19:44:00.3523395Z 2025-05-07T19:44:00.3523399Z 2025-05-07T19:44:00.3523403Z 2025-05-07T19:44:00.3524283Z cffi-1.17.1 | 289 KB | ########## | 100%  2025-05-07T19:44:00.3524580Z 2025-05-07T19:44:00.3524585Z 2025-05-07T19:44:00.3524589Z 2025-05-07T19:44:00.3524615Z 2025-05-07T19:44:00.3530021Z cffi-1.17.1 | 289 KB | ########## | 100%  2025-05-07T19:44:00.3530420Z 2025-05-07T19:44:00.3530870Z 2025-05-07T19:44:00.3531056Z  2025-05-07T19:44:00.3531273Z 2025-05-07T19:44:00.3531278Z 2025-05-07T19:44:00.3531466Z  2025-05-07T19:44:00.3531692Z 2025-05-07T19:44:00.3531696Z 2025-05-07T19:44:00.3531699Z 2025-05-07T19:44:00.3531873Z  2025-05-07T19:44:00.3532109Z 2025-05-07T19:44:00.3532113Z 2025-05-07T19:44:00.3532117Z 2025-05-07T19:44:00.3532121Z 2025-05-07T19:44:00.3532302Z  2025-05-07T19:44:00.3532524Z 2025-05-07T19:44:00.3532547Z 2025-05-07T19:44:00.3532550Z 2025-05-07T19:44:00.3532554Z 2025-05-07T19:44:00.3532558Z 2025-05-07T19:44:00.3533017Z  2025-05-07T19:44:00.3533255Z 2025-05-07T19:44:00.3533259Z 2025-05-07T19:44:00.3533263Z 2025-05-07T19:44:00.3533266Z 2025-05-07T19:44:00.3533269Z 2025-05-07T19:44:00.3533273Z 2025-05-07T19:44:00.3533485Z  2025-05-07T19:44:00.3533717Z 2025-05-07T19:44:00.3533721Z 2025-05-07T19:44:00.3533725Z 2025-05-07T19:44:00.3533728Z 2025-05-07T19:44:00.3533732Z 2025-05-07T19:44:00.3533735Z 2025-05-07T19:44:00.3533738Z 2025-05-07T19:44:00.3533944Z  2025-05-07T19:44:00.3534175Z 2025-05-07T19:44:00.3534179Z 2025-05-07T19:44:00.3534182Z 2025-05-07T19:44:00.3534186Z 2025-05-07T19:44:00.3534189Z 2025-05-07T19:44:00.3534193Z 2025-05-07T19:44:00.3534196Z 2025-05-07T19:44:00.3534200Z 2025-05-07T19:44:00.3534407Z  2025-05-07T19:44:00.3534648Z 2025-05-07T19:44:00.3534652Z 2025-05-07T19:44:00.3534655Z 2025-05-07T19:44:00.3534663Z 2025-05-07T19:44:00.3534667Z 2025-05-07T19:44:00.3534671Z 2025-05-07T19:44:00.3534675Z 2025-05-07T19:44:00.3534679Z 2025-05-07T19:44:00.3534683Z 2025-05-07T19:44:00.3534921Z  done 2025-05-07T19:44:00.4540707Z Preparing transaction: - done 2025-05-07T19:44:00.5551135Z Verifying transaction: | done 2025-05-07T19:44:01.9583500Z Executing transaction: - \ | / - \ | / - \ | / - \ done 2025-05-07T19:44:02.0538481Z [SETUP] Testing pyOpenSSL import ... 2025-05-07T19:44:03.7238239Z [CHECK] Python (sub-)package 'OpenSSL' found ... 2025-05-07T19:44:03.7248090Z [SETUP] Installing libxcrypt ... 2025-05-07T19:44:03.7272950Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y libxcrypt 2025-05-07T19:44:04.3872842Z Channels: 2025-05-07T19:44:04.3873326Z - conda-forge 2025-05-07T19:44:04.3873591Z Platform: linux-64 2025-05-07T19:44:07.4086114Z Collecting package metadata (repodata.json): - \ | / done 2025-05-07T19:44:07.8340893Z Solving environment: \ | done 2025-05-07T19:44:07.8818439Z 2025-05-07T19:44:07.8818687Z ## Package Plan ## 2025-05-07T19:44:07.8818881Z 2025-05-07T19:44:07.8819098Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:44:07.8819434Z 2025-05-07T19:44:07.8819537Z added / updated specs: 2025-05-07T19:44:07.8819816Z - libxcrypt 2025-05-07T19:44:07.8819957Z 2025-05-07T19:44:07.8819961Z 2025-05-07T19:44:07.8820088Z The following packages will be downloaded: 2025-05-07T19:44:07.8820331Z 2025-05-07T19:44:07.8820468Z package | build 2025-05-07T19:44:07.8820807Z ---------------------------|----------------- 2025-05-07T19:44:07.8821226Z libxcrypt-4.4.36 | hd590300_1 98 KB conda-forge 2025-05-07T19:44:07.8821678Z ------------------------------------------------------------ 2025-05-07T19:44:07.8822076Z Total: 98 KB 2025-05-07T19:44:07.8822324Z 2025-05-07T19:44:07.8822460Z The following NEW packages will be INSTALLED: 2025-05-07T19:44:07.8822698Z 2025-05-07T19:44:07.8822935Z libxcrypt conda-forge/linux-64::libxcrypt-4.4.36-hd590300_1 2025-05-07T19:44:07.8823258Z 2025-05-07T19:44:07.8823262Z 2025-05-07T19:44:07.8823266Z 2025-05-07T19:44:07.8823412Z Downloading and Extracting Packages: ...working... 2025-05-07T19:44:08.0475124Z libxcrypt-4.4.36 | 98 KB | | 0% 2025-05-07T19:44:08.0509217Z libxcrypt-4.4.36 | 98 KB | #6 | 16% 2025-05-07T19:44:08.0609487Z libxcrypt-4.4.36 | 98 KB | ########## | 100% 2025-05-07T19:44:08.0610735Z libxcrypt-4.4.36 | 98 KB | ########## | 100% 2025-05-07T19:44:08.0611731Z 2025-05-07T19:44:08.0612578Z done 2025-05-07T19:44:08.1619120Z Preparing transaction: - done 2025-05-07T19:44:08.2627981Z Verifying transaction: | done 2025-05-07T19:44:08.3641008Z Executing transaction: - done 2025-05-07T19:44:11.6635739Z [SETUP] Copying over ... 2025-05-07T19:44:11.6637143Z + cp /github/home/miniconda/envs/build_binary/include/crypt.h /github/home/miniconda/envs/build_binary/include/python3.13/crypt.h 2025-05-07T19:44:11.6637779Z 2025-05-07T19:44:11.6667059Z 2025-05-07T19:44:13.2517535Z [SETUP] Installed Python version: Python 3.13.2 2025-05-07T19:44:13.2518178Z [SETUP] Successfully created Conda environment: build_binary 2025-05-07T19:44:13.2589472Z ##[group]Run . $PRELUDE; install_cxx_compiler $BUILD_ENV clang 2025-05-07T19:44:13.2590013Z . $PRELUDE; install_cxx_compiler $BUILD_ENV clang 2025-05-07T19:44:13.2590686Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:44:13.2591069Z env: 2025-05-07T19:44:13.2591325Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:44:13.2591699Z BUILD_ENV: build_binary 2025-05-07T19:44:13.2592001Z BUILD_TARGET: genai 2025-05-07T19:44:13.2592288Z BUILD_VARIANT: cuda 2025-05-07T19:44:13.2592547Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:44:13.2592852Z ##[endgroup] 2025-05-07T19:44:13.7543445Z ################################################################################ 2025-05-07T19:44:13.7543893Z # Install C/C++ Compilers 2025-05-07T19:44:13.7544212Z # 2025-05-07T19:44:13.7560908Z # [2025-05-07T19:44:13.755Z] + install_cxx_compiler build_binary clang 2025-05-07T19:44:13.7562811Z ################################################################################ 2025-05-07T19:44:13.7563653Z 2025-05-07T19:44:13.7588088Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:44:13.8519373Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:44:13.8529472Z [INSTALL] Installing GLIBC (architecture = 64) ... 2025-05-07T19:44:13.8554517Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y sysroot_linux-64=2.17 2025-05-07T19:44:14.5211777Z Channels: 2025-05-07T19:44:14.5212523Z - conda-forge 2025-05-07T19:44:14.5215134Z Platform: linux-64 2025-05-07T19:44:17.6450648Z Collecting package metadata (repodata.json): - \ | / done 2025-05-07T19:44:18.0683395Z Solving environment: \ done 2025-05-07T19:44:18.1153389Z 2025-05-07T19:44:18.1154163Z ## Package Plan ## 2025-05-07T19:44:18.1154751Z 2025-05-07T19:44:18.1155362Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:44:18.1156299Z 2025-05-07T19:44:18.1156625Z added / updated specs: 2025-05-07T19:44:18.1157447Z - sysroot_linux-64=2.17 2025-05-07T19:44:18.1157987Z 2025-05-07T19:44:18.1157999Z 2025-05-07T19:44:18.1158370Z The following packages will be downloaded: 2025-05-07T19:44:18.1159041Z 2025-05-07T19:44:18.1159419Z package | build 2025-05-07T19:44:18.1160405Z ---------------------------|----------------- 2025-05-07T19:44:18.1161735Z kernel-headers_linux-64-3.10.0| he073ed8_18 921 KB conda-forge 2025-05-07T19:44:18.1163316Z sysroot_linux-64-2.17 | h0157908_18 14.5 MB conda-forge 2025-05-07T19:44:18.1163956Z ------------------------------------------------------------ 2025-05-07T19:44:18.1164308Z Total: 15.4 MB 2025-05-07T19:44:18.1164553Z 2025-05-07T19:44:18.1164690Z The following NEW packages will be INSTALLED: 2025-05-07T19:44:18.1164921Z 2025-05-07T19:44:18.1165252Z kernel-headers_li~ conda-forge/noarch::kernel-headers_linux-64-3.10.0-he073ed8_18 2025-05-07T19:44:18.1165847Z sysroot_linux-64 conda-forge/noarch::sysroot_linux-64-2.17-h0157908_18 2025-05-07T19:44:18.1166190Z 2025-05-07T19:44:18.1166194Z 2025-05-07T19:44:18.1166198Z 2025-05-07T19:44:18.1166349Z Downloading and Extracting Packages: ...working... 2025-05-07T19:44:18.1166771Z sysroot_linux-64-2.1 | 14.5 MB | | 0% 2025-05-07T19:44:18.1167011Z 2025-05-07T19:44:18.3179081Z kernel-headers_linux | 921 KB | | 0%  2025-05-07T19:44:18.3180509Z 2025-05-07T19:44:18.3213875Z kernel-headers_linux | 921 KB | 1 | 2%  2025-05-07T19:44:18.3303059Z sysroot_linux-64-2.1 | 14.5 MB | | 0% 2025-05-07T19:44:18.3303915Z 2025-05-07T19:44:18.4216337Z kernel-headers_linux | 921 KB | ########## | 100%  2025-05-07T19:44:18.4978914Z sysroot_linux-64-2.1 | 14.5 MB | ########3 | 84% 2025-05-07T19:44:18.5176581Z sysroot_linux-64-2.1 | 14.5 MB | ########## | 100% 2025-05-07T19:44:18.5177600Z 2025-05-07T19:44:18.5178975Z kernel-headers_linux | 921 KB | ########## | 100%  2025-05-07T19:44:18.5179790Z 2025-05-07T19:44:18.9476923Z kernel-headers_linux | 921 KB | ########## | 100%  2025-05-07T19:44:18.9478212Z sysroot_linux-64-2.1 | 14.5 MB | ########## | 100% 2025-05-07T19:44:18.9479291Z 2025-05-07T19:44:18.9479920Z 2025-05-07T19:44:18.9480490Z  done 2025-05-07T19:44:19.0486284Z Preparing transaction: / done 2025-05-07T19:44:19.2492148Z Verifying transaction: \ | done 2025-05-07T19:44:19.3500231Z Executing transaction: - done 2025-05-07T19:44:19.4334053Z [CHECK] LD_LIBRARY_PATH = 2025-05-07T19:44:19.4335041Z [CHECK] CONDA_PREFIX is not set. 2025-05-07T19:44:21.0660465Z [CHECK] libstdc++.so.6 found in CONDA_PREFIX PATH (symbolic link): /github/home/miniconda/envs/build_binary/lib/libstdc++.so.6 2025-05-07T19:44:21.0680458Z [INSTALL] Installing GCC (11.4.0, 64) through Conda ... 2025-05-07T19:44:21.0707264Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y gxx_linux-64=11.4.0 2025-05-07T19:44:21.7768364Z Channels: 2025-05-07T19:44:21.7769066Z - conda-forge 2025-05-07T19:44:21.7769580Z Platform: linux-64 2025-05-07T19:44:24.8835330Z Collecting package metadata (repodata.json): - \ | / done 2025-05-07T19:44:26.0128681Z Solving environment: \ | / done 2025-05-07T19:44:26.0608278Z 2025-05-07T19:44:26.0608818Z ## Package Plan ## 2025-05-07T19:44:26.0609299Z 2025-05-07T19:44:26.0609935Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:44:26.0610935Z 2025-05-07T19:44:26.0611221Z added / updated specs: 2025-05-07T19:44:26.0611990Z - gxx_linux-64=11.4.0 2025-05-07T19:44:26.0612455Z 2025-05-07T19:44:26.0612467Z 2025-05-07T19:44:26.0612855Z The following packages will be downloaded: 2025-05-07T19:44:26.0613511Z 2025-05-07T19:44:26.0613859Z package | build 2025-05-07T19:44:26.0614894Z ---------------------------|----------------- 2025-05-07T19:44:26.0615764Z binutils_impl_linux-64-2.40| ha1999f0_7 6.0 MB conda-forge 2025-05-07T19:44:26.0616380Z binutils_linux-64-2.40 | hb3c18ed_4 28 KB conda-forge 2025-05-07T19:44:26.0616889Z gcc_impl_linux-64-11.4.0 | h00c12a0_13 53.0 MB conda-forge 2025-05-07T19:44:26.0617397Z gcc_linux-64-11.4.0 | ha077dfb_4 31 KB conda-forge 2025-05-07T19:44:26.0617917Z gxx_impl_linux-64-11.4.0 | h634f3ee_13 11.2 MB conda-forge 2025-05-07T19:44:26.0618399Z gxx_linux-64-11.4.0 | h35bfe5d_4 29 KB conda-forge 2025-05-07T19:44:26.0618895Z ld_impl_linux-64-2.40 | hf3520f5_7 691 KB conda-forge 2025-05-07T19:44:26.0619410Z libgcc-devel_linux-64-11.4.0| h8f596e0_113 2.3 MB conda-forge 2025-05-07T19:44:26.0619957Z libsanitizer-11.4.0 | h5763a12_13 3.5 MB conda-forge 2025-05-07T19:44:26.0620476Z libstdcxx-15.1.0 | h8f9b012_2 3.7 MB conda-forge 2025-05-07T19:44:26.0620991Z libstdcxx-devel_linux-64-11.4.0| h8f596e0_113 11.1 MB conda-forge 2025-05-07T19:44:26.0621544Z libstdcxx-ng-15.1.0 | h4852527_2 34 KB conda-forge 2025-05-07T19:44:26.0622098Z ------------------------------------------------------------ 2025-05-07T19:44:26.0622780Z Total: 91.6 MB 2025-05-07T19:44:26.0622999Z 2025-05-07T19:44:26.0623324Z The following NEW packages will be INSTALLED: 2025-05-07T19:44:26.0623596Z 2025-05-07T19:44:26.0623903Z binutils_impl_lin~ conda-forge/linux-64::binutils_impl_linux-64-2.40-ha1999f0_7 2025-05-07T19:44:26.0624542Z binutils_linux-64 conda-forge/linux-64::binutils_linux-64-2.40-hb3c18ed_4 2025-05-07T19:44:26.0625125Z gcc_impl_linux-64 conda-forge/linux-64::gcc_impl_linux-64-11.4.0-h00c12a0_13 2025-05-07T19:44:26.0625877Z gcc_linux-64 conda-forge/linux-64::gcc_linux-64-11.4.0-ha077dfb_4 2025-05-07T19:44:26.0626436Z gxx_impl_linux-64 conda-forge/linux-64::gxx_impl_linux-64-11.4.0-h634f3ee_13 2025-05-07T19:44:26.0627024Z gxx_linux-64 conda-forge/linux-64::gxx_linux-64-11.4.0-h35bfe5d_4 2025-05-07T19:44:26.0627636Z libgcc-devel_linu~ conda-forge/noarch::libgcc-devel_linux-64-11.4.0-h8f596e0_113 2025-05-07T19:44:26.0628252Z libsanitizer conda-forge/linux-64::libsanitizer-11.4.0-h5763a12_13 2025-05-07T19:44:26.0628827Z libstdcxx conda-forge/linux-64::libstdcxx-15.1.0-h8f9b012_2 2025-05-07T19:44:26.0629422Z libstdcxx-devel_l~ conda-forge/noarch::libstdcxx-devel_linux-64-11.4.0-h8f596e0_113 2025-05-07T19:44:26.0629850Z 2025-05-07T19:44:26.0629986Z The following packages will be UPDATED: 2025-05-07T19:44:26.0630213Z 2025-05-07T19:44:26.0630586Z ld_impl_linux-64 pkgs/main::ld_impl_linux-64-2.40-h12e~ --> conda-forge::ld_impl_linux-64-2.40-hf3520f5_7 2025-05-07T19:44:26.0631372Z libstdcxx-ng pkgs/main::libstdcxx-ng-11.2.0-h12345~ --> conda-forge::libstdcxx-ng-15.1.0-h4852527_2 2025-05-07T19:44:26.0631854Z 2025-05-07T19:44:26.0631858Z 2025-05-07T19:44:26.0631861Z 2025-05-07T19:44:26.0632025Z Downloading and Extracting Packages: ...working... 2025-05-07T19:44:26.0632467Z gcc_impl_linux-64-11 | 53.0 MB | | 0% 2025-05-07T19:44:26.0632721Z 2025-05-07T19:44:26.0633088Z gxx_impl_linux-64-11 | 11.2 MB | | 0%  2025-05-07T19:44:26.0633394Z 2025-05-07T19:44:26.0633397Z 2025-05-07T19:44:26.0633641Z libstdcxx-devel_linu | 11.1 MB | | 0%  2025-05-07T19:44:26.0633929Z 2025-05-07T19:44:26.0633932Z 2025-05-07T19:44:26.0633936Z 2025-05-07T19:44:26.0634222Z binutils_impl_linux- | 6.0 MB | | 0%  2025-05-07T19:44:26.0634512Z 2025-05-07T19:44:26.0634516Z 2025-05-07T19:44:26.0634519Z 2025-05-07T19:44:26.0634523Z 2025-05-07T19:44:26.0640141Z libstdcxx-15.1.0 | 3.7 MB | | 0%  2025-05-07T19:44:26.0640639Z 2025-05-07T19:44:26.0640648Z 2025-05-07T19:44:26.0640652Z 2025-05-07T19:44:26.0640656Z 2025-05-07T19:44:26.0668547Z 2025-05-07T19:44:26.0669862Z libsanitizer-11.4.0 | 3.5 MB | | 0%  2025-05-07T19:44:26.0670792Z 2025-05-07T19:44:26.0670806Z 2025-05-07T19:44:26.0670816Z 2025-05-07T19:44:26.0670827Z 2025-05-07T19:44:26.0670874Z 2025-05-07T19:44:26.0670885Z 2025-05-07T19:44:26.0671707Z libgcc-devel_linux-6 | 2.3 MB | | 0%  2025-05-07T19:44:26.0672608Z 2025-05-07T19:44:26.0672619Z 2025-05-07T19:44:26.0672630Z 2025-05-07T19:44:26.0672641Z 2025-05-07T19:44:26.0672652Z 2025-05-07T19:44:26.0672662Z 2025-05-07T19:44:26.0672740Z 2025-05-07T19:44:26.0673493Z ld_impl_linux-64-2.4 | 691 KB | | 0%  2025-05-07T19:44:26.0674431Z 2025-05-07T19:44:26.0674435Z 2025-05-07T19:44:26.0674438Z 2025-05-07T19:44:26.0674442Z 2025-05-07T19:44:26.0674446Z 2025-05-07T19:44:26.0674450Z 2025-05-07T19:44:26.0674454Z 2025-05-07T19:44:26.0674466Z 2025-05-07T19:44:26.0675040Z libstdcxx-ng-15.1.0 | 34 KB | | 0%  2025-05-07T19:44:26.0675354Z 2025-05-07T19:44:26.0675358Z 2025-05-07T19:44:26.0675361Z 2025-05-07T19:44:26.0675364Z 2025-05-07T19:44:26.0675368Z 2025-05-07T19:44:26.0675372Z 2025-05-07T19:44:26.0675375Z 2025-05-07T19:44:26.0675379Z 2025-05-07T19:44:26.0675383Z 2025-05-07T19:44:26.0675949Z gcc_linux-64-11.4.0 | 31 KB | | 0%  2025-05-07T19:44:26.0676252Z 2025-05-07T19:44:26.0676256Z 2025-05-07T19:44:26.0676259Z 2025-05-07T19:44:26.0676263Z 2025-05-07T19:44:26.0676266Z 2025-05-07T19:44:26.0676269Z 2025-05-07T19:44:26.0676273Z 2025-05-07T19:44:26.0676276Z 2025-05-07T19:44:26.0676280Z 2025-05-07T19:44:26.0676298Z 2025-05-07T19:44:26.0676599Z gxx_linux-64-11.4.0 | 29 KB | | 0%  2025-05-07T19:44:26.0676905Z 2025-05-07T19:44:26.0676910Z 2025-05-07T19:44:26.0676913Z 2025-05-07T19:44:26.0677023Z 2025-05-07T19:44:26.0677028Z 2025-05-07T19:44:26.0677032Z 2025-05-07T19:44:26.0677035Z 2025-05-07T19:44:26.0677039Z 2025-05-07T19:44:26.0677043Z 2025-05-07T19:44:26.0677046Z 2025-05-07T19:44:26.0677079Z 2025-05-07T19:44:26.2157112Z binutils_linux-64-2. | 28 KB | | 0%  2025-05-07T19:44:26.2157483Z 2025-05-07T19:44:26.2157668Z 2025-05-07T19:44:26.2157700Z 2025-05-07T19:44:26.2158400Z binutils_impl_linux- | 6.0 MB | | 0%  2025-05-07T19:44:26.2158758Z 2025-05-07T19:44:26.2158762Z 2025-05-07T19:44:26.2160504Z libstdcxx-devel_linu | 11.1 MB | | 0%  2025-05-07T19:44:26.2160809Z 2025-05-07T19:44:26.2160813Z 2025-05-07T19:44:26.2160816Z 2025-05-07T19:44:26.2161889Z 2025-05-07T19:44:26.2945486Z libstdcxx-15.1.0 | 3.7 MB | | 0%  2025-05-07T19:44:26.3014796Z gcc_impl_linux-64-11 | 53.0 MB | | 0% 2025-05-07T19:44:26.3015113Z 2025-05-07T19:44:26.3015432Z 2025-05-07T19:44:26.3015471Z 2025-05-07T19:44:26.3015476Z 2025-05-07T19:44:26.3259475Z libstdcxx-15.1.0 | 3.7 MB | ########## | 100%  2025-05-07T19:44:26.3260388Z 2025-05-07T19:44:26.3292199Z gxx_impl_linux-64-11 | 11.2 MB | | 0%  2025-05-07T19:44:26.3292516Z 2025-05-07T19:44:26.3292521Z 2025-05-07T19:44:26.3292525Z 2025-05-07T19:44:26.3292806Z binutils_impl_linux- | 6.0 MB | ########## | 100%  2025-05-07T19:44:26.3293194Z 2025-05-07T19:44:26.3293199Z 2025-05-07T19:44:26.3293202Z 2025-05-07T19:44:26.3475447Z binutils_impl_linux- | 6.0 MB | ########## | 100%  2025-05-07T19:44:26.3475873Z 2025-05-07T19:44:26.3475945Z 2025-05-07T19:44:26.3475949Z 2025-05-07T19:44:26.3475963Z 2025-05-07T19:44:26.3475967Z 2025-05-07T19:44:26.3704602Z libsanitizer-11.4.0 | 3.5 MB | | 0%  2025-05-07T19:44:26.3722239Z 2025-05-07T19:44:26.3722246Z 2025-05-07T19:44:26.3722251Z 2025-05-07T19:44:26.3722254Z 2025-05-07T19:44:26.3722277Z 2025-05-07T19:44:26.3722298Z 2025-05-07T19:44:26.3989718Z libgcc-devel_linux-6 | 2.3 MB | | 1%  2025-05-07T19:44:26.3990085Z 2025-05-07T19:44:26.3990109Z 2025-05-07T19:44:26.3990364Z libstdcxx-devel_linu | 11.1 MB | ########## | 100%  2025-05-07T19:44:26.3990639Z 2025-05-07T19:44:26.3990653Z 2025-05-07T19:44:26.4220696Z libstdcxx-devel_linu | 11.1 MB | ########## | 100%  2025-05-07T19:44:26.4221057Z 2025-05-07T19:44:26.4221062Z 2025-05-07T19:44:26.4221066Z 2025-05-07T19:44:26.4221069Z 2025-05-07T19:44:26.4222238Z 2025-05-07T19:44:26.4298124Z libsanitizer-11.4.0 | 3.5 MB | ########## | 100%  2025-05-07T19:44:26.4298466Z 2025-05-07T19:44:26.4333188Z gxx_impl_linux-64-11 | 11.2 MB | ####5 | 45%  2025-05-07T19:44:26.4333550Z 2025-05-07T19:44:26.4333556Z 2025-05-07T19:44:26.4333560Z 2025-05-07T19:44:26.4333565Z 2025-05-07T19:44:26.4333569Z 2025-05-07T19:44:26.4333574Z 2025-05-07T19:44:26.4526918Z libgcc-devel_linux-6 | 2.3 MB | ########## | 100%  2025-05-07T19:44:26.4582353Z gcc_impl_linux-64-11 | 53.0 MB | #4 | 14% 2025-05-07T19:44:26.4583143Z 2025-05-07T19:44:26.4583158Z 2025-05-07T19:44:26.4583169Z 2025-05-07T19:44:26.4583183Z 2025-05-07T19:44:26.4584270Z libstdcxx-15.1.0 | 3.7 MB | ########## | 100%  2025-05-07T19:44:26.4584612Z 2025-05-07T19:44:26.4584867Z 2025-05-07T19:44:26.4584871Z 2025-05-07T19:44:26.4584887Z 2025-05-07T19:44:26.4602347Z libstdcxx-15.1.0 | 3.7 MB | ########## | 100%  2025-05-07T19:44:26.4602703Z 2025-05-07T19:44:26.4602708Z 2025-05-07T19:44:26.4602712Z 2025-05-07T19:44:26.4602716Z 2025-05-07T19:44:26.4602719Z 2025-05-07T19:44:26.4602722Z 2025-05-07T19:44:26.4602726Z 2025-05-07T19:44:26.4675720Z ld_impl_linux-64-2.4 | 691 KB | 2 | 2%  2025-05-07T19:44:26.4676685Z 2025-05-07T19:44:26.4676700Z 2025-05-07T19:44:26.4676712Z 2025-05-07T19:44:26.4676722Z 2025-05-07T19:44:26.4678191Z 2025-05-07T19:44:26.4678206Z 2025-05-07T19:44:26.4678216Z 2025-05-07T19:44:26.4678226Z 2025-05-07T19:44:26.4678237Z 2025-05-07T19:44:26.4693391Z gcc_linux-64-11.4.0 | 31 KB | #####2 | 52%  2025-05-07T19:44:26.4694353Z 2025-05-07T19:44:26.4694366Z 2025-05-07T19:44:26.4694377Z 2025-05-07T19:44:26.4694388Z 2025-05-07T19:44:26.4694398Z 2025-05-07T19:44:26.4694452Z 2025-05-07T19:44:26.4694463Z 2025-05-07T19:44:26.4694473Z 2025-05-07T19:44:26.4694484Z 2025-05-07T19:44:26.4718437Z gcc_linux-64-11.4.0 | 31 KB | ########## | 100%  2025-05-07T19:44:26.4718835Z 2025-05-07T19:44:26.4718977Z 2025-05-07T19:44:26.4718981Z 2025-05-07T19:44:26.4718986Z 2025-05-07T19:44:26.4719069Z 2025-05-07T19:44:26.4719073Z 2025-05-07T19:44:26.4719076Z 2025-05-07T19:44:26.4719096Z 2025-05-07T19:44:26.4736260Z libstdcxx-ng-15.1.0 | 34 KB | ####7 | 47%  2025-05-07T19:44:26.4736606Z 2025-05-07T19:44:26.4736626Z 2025-05-07T19:44:26.4736630Z 2025-05-07T19:44:26.4736634Z 2025-05-07T19:44:26.4736638Z 2025-05-07T19:44:26.4736666Z 2025-05-07T19:44:26.4736669Z 2025-05-07T19:44:26.4736697Z 2025-05-07T19:44:26.4821457Z libstdcxx-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:26.4822472Z 2025-05-07T19:44:26.4822478Z 2025-05-07T19:44:26.4822481Z 2025-05-07T19:44:26.4822505Z 2025-05-07T19:44:26.4822531Z 2025-05-07T19:44:26.4822535Z 2025-05-07T19:44:26.4822538Z 2025-05-07T19:44:26.5079140Z ld_impl_linux-64-2.4 | 691 KB | ########## | 100%  2025-05-07T19:44:26.5080068Z 2025-05-07T19:44:26.5080115Z 2025-05-07T19:44:26.5080129Z 2025-05-07T19:44:26.5080140Z 2025-05-07T19:44:26.5080179Z 2025-05-07T19:44:26.5080191Z 2025-05-07T19:44:26.5080202Z 2025-05-07T19:44:26.5080212Z 2025-05-07T19:44:26.5080223Z 2025-05-07T19:44:26.5080234Z 2025-05-07T19:44:26.5097763Z gxx_linux-64-11.4.0 | 29 KB | #####5 | 55%  2025-05-07T19:44:26.5098695Z 2025-05-07T19:44:26.5098709Z 2025-05-07T19:44:26.5098750Z 2025-05-07T19:44:26.5098760Z 2025-05-07T19:44:26.5098771Z 2025-05-07T19:44:26.5098781Z 2025-05-07T19:44:26.5098792Z 2025-05-07T19:44:26.5098802Z 2025-05-07T19:44:26.5098812Z 2025-05-07T19:44:26.5098834Z 2025-05-07T19:44:26.5195589Z gxx_linux-64-11.4.0 | 29 KB | ########## | 100%  2025-05-07T19:44:26.5196064Z 2025-05-07T19:44:26.5196331Z 2025-05-07T19:44:26.5196340Z 2025-05-07T19:44:26.5196345Z 2025-05-07T19:44:26.5196350Z 2025-05-07T19:44:26.5196355Z 2025-05-07T19:44:26.5196359Z 2025-05-07T19:44:26.5196364Z 2025-05-07T19:44:26.5196369Z 2025-05-07T19:44:26.5196373Z 2025-05-07T19:44:26.5196378Z 2025-05-07T19:44:26.5213949Z binutils_linux-64-2. | 28 KB | #####6 | 56%  2025-05-07T19:44:26.5214329Z 2025-05-07T19:44:26.5214377Z 2025-05-07T19:44:26.5214381Z 2025-05-07T19:44:26.5214385Z 2025-05-07T19:44:26.5214400Z 2025-05-07T19:44:26.5214416Z 2025-05-07T19:44:26.5214433Z 2025-05-07T19:44:26.5214537Z 2025-05-07T19:44:26.5214545Z 2025-05-07T19:44:26.5214551Z 2025-05-07T19:44:26.5214556Z 2025-05-07T19:44:26.5529196Z binutils_linux-64-2. | 28 KB | ########## | 100%  2025-05-07T19:44:26.6135633Z gcc_impl_linux-64-11 | 53.0 MB | ###3 | 33% 2025-05-07T19:44:26.6135987Z 2025-05-07T19:44:26.6270544Z gxx_impl_linux-64-11 | 11.2 MB | #######2 | 73%  2025-05-07T19:44:26.6271383Z 2025-05-07T19:44:26.6271426Z 2025-05-07T19:44:26.6271438Z 2025-05-07T19:44:26.6271449Z 2025-05-07T19:44:26.6271459Z 2025-05-07T19:44:26.6272242Z libsanitizer-11.4.0 | 3.5 MB | ########## | 100%  2025-05-07T19:44:26.6273141Z 2025-05-07T19:44:26.6273145Z 2025-05-07T19:44:26.6273150Z 2025-05-07T19:44:26.6273153Z 2025-05-07T19:44:26.6273158Z 2025-05-07T19:44:26.6528354Z libsanitizer-11.4.0 | 3.5 MB | ########## | 100%  2025-05-07T19:44:26.7102452Z gcc_impl_linux-64-11 | 53.0 MB | #####3 | 53% 2025-05-07T19:44:26.7103306Z 2025-05-07T19:44:26.7103322Z 2025-05-07T19:44:26.7103334Z 2025-05-07T19:44:26.7283904Z binutils_impl_linux- | 6.0 MB | ########## | 100%  2025-05-07T19:44:26.7284234Z 2025-05-07T19:44:26.7284270Z 2025-05-07T19:44:26.7284274Z 2025-05-07T19:44:26.7284277Z 2025-05-07T19:44:26.7284282Z 2025-05-07T19:44:26.7284285Z 2025-05-07T19:44:26.7284597Z libgcc-devel_linux-6 | 2.3 MB | ########## | 100%  2025-05-07T19:44:26.7284919Z 2025-05-07T19:44:26.7284923Z 2025-05-07T19:44:26.7284926Z 2025-05-07T19:44:26.7284931Z 2025-05-07T19:44:26.7284957Z 2025-05-07T19:44:26.7284961Z 2025-05-07T19:44:26.7395384Z libgcc-devel_linux-6 | 2.3 MB | ########## | 100%  2025-05-07T19:44:26.7395739Z 2025-05-07T19:44:26.7395744Z 2025-05-07T19:44:26.7395748Z 2025-05-07T19:44:26.7395751Z 2025-05-07T19:44:26.7395755Z 2025-05-07T19:44:26.7395779Z 2025-05-07T19:44:26.7395783Z 2025-05-07T19:44:26.7395787Z 2025-05-07T19:44:26.7395807Z 2025-05-07T19:44:26.7396085Z gcc_linux-64-11.4.0 | 31 KB | ########## | 100%  2025-05-07T19:44:26.7396385Z 2025-05-07T19:44:26.7396388Z 2025-05-07T19:44:26.7396392Z 2025-05-07T19:44:26.7396395Z 2025-05-07T19:44:26.7396399Z 2025-05-07T19:44:26.7396402Z 2025-05-07T19:44:26.7396427Z 2025-05-07T19:44:26.7396431Z 2025-05-07T19:44:26.7396434Z 2025-05-07T19:44:26.7529100Z gcc_linux-64-11.4.0 | 31 KB | ########## | 100%  2025-05-07T19:44:26.7662757Z gcc_impl_linux-64-11 | 53.0 MB | #######6 | 77% 2025-05-07T19:44:26.7663530Z 2025-05-07T19:44:26.7663536Z 2025-05-07T19:44:26.7663539Z 2025-05-07T19:44:26.7663543Z 2025-05-07T19:44:26.7663546Z 2025-05-07T19:44:26.7663550Z 2025-05-07T19:44:26.7663553Z 2025-05-07T19:44:26.7663556Z 2025-05-07T19:44:26.7663902Z libstdcxx-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:26.7664241Z 2025-05-07T19:44:26.7664245Z 2025-05-07T19:44:26.7664262Z 2025-05-07T19:44:26.7664266Z 2025-05-07T19:44:26.7664269Z 2025-05-07T19:44:26.7664273Z 2025-05-07T19:44:26.7664276Z 2025-05-07T19:44:26.7664279Z 2025-05-07T19:44:26.7787505Z libstdcxx-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:26.7788512Z 2025-05-07T19:44:26.7788526Z 2025-05-07T19:44:26.7788537Z 2025-05-07T19:44:26.7788548Z 2025-05-07T19:44:26.7788591Z 2025-05-07T19:44:26.7788602Z 2025-05-07T19:44:26.7788612Z 2025-05-07T19:44:26.7789385Z ld_impl_linux-64-2.4 | 691 KB | ########## | 100%  2025-05-07T19:44:26.7790283Z 2025-05-07T19:44:26.7790294Z 2025-05-07T19:44:26.7790304Z 2025-05-07T19:44:26.7790314Z 2025-05-07T19:44:26.7790323Z 2025-05-07T19:44:26.7790333Z 2025-05-07T19:44:26.7790344Z 2025-05-07T19:44:26.7962788Z ld_impl_linux-64-2.4 | 691 KB | ########## | 100%  2025-05-07T19:44:26.7963161Z 2025-05-07T19:44:26.7963166Z 2025-05-07T19:44:26.7963170Z 2025-05-07T19:44:26.7963174Z 2025-05-07T19:44:26.7963195Z 2025-05-07T19:44:26.7963199Z 2025-05-07T19:44:26.7963204Z 2025-05-07T19:44:26.7963219Z 2025-05-07T19:44:26.7963309Z 2025-05-07T19:44:26.7963328Z 2025-05-07T19:44:26.7963914Z gxx_linux-64-11.4.0 | 29 KB | ########## | 100%  2025-05-07T19:44:26.7964240Z 2025-05-07T19:44:26.7964257Z 2025-05-07T19:44:26.7964287Z 2025-05-07T19:44:26.7964291Z 2025-05-07T19:44:26.7964513Z 2025-05-07T19:44:26.7964518Z 2025-05-07T19:44:26.7964521Z 2025-05-07T19:44:26.7964525Z 2025-05-07T19:44:26.7964529Z 2025-05-07T19:44:26.7964533Z 2025-05-07T19:44:26.7980605Z gxx_linux-64-11.4.0 | 29 KB | ########## | 100%  2025-05-07T19:44:26.7981526Z 2025-05-07T19:44:26.7982187Z gxx_impl_linux-64-11 | 11.2 MB | ########## | 100%  2025-05-07T19:44:26.7982937Z 2025-05-07T19:44:26.8064386Z gxx_impl_linux-64-11 | 11.2 MB | ########## | 100%  2025-05-07T19:44:26.8065258Z 2025-05-07T19:44:26.8065272Z 2025-05-07T19:44:26.8065283Z 2025-05-07T19:44:26.8066471Z 2025-05-07T19:44:26.8066478Z 2025-05-07T19:44:26.8066482Z 2025-05-07T19:44:26.8066485Z 2025-05-07T19:44:26.8066488Z 2025-05-07T19:44:26.8066492Z 2025-05-07T19:44:26.8066495Z 2025-05-07T19:44:26.8066499Z 2025-05-07T19:44:26.8066855Z binutils_linux-64-2. | 28 KB | ########## | 100%  2025-05-07T19:44:26.8067184Z 2025-05-07T19:44:26.8067187Z 2025-05-07T19:44:26.8067202Z 2025-05-07T19:44:26.8067206Z 2025-05-07T19:44:26.8067209Z 2025-05-07T19:44:26.8067213Z 2025-05-07T19:44:26.8067216Z 2025-05-07T19:44:26.8067219Z 2025-05-07T19:44:26.8067223Z 2025-05-07T19:44:26.8067226Z 2025-05-07T19:44:26.8067229Z 2025-05-07T19:44:26.9876126Z binutils_linux-64-2. | 28 KB | ########## | 100%  2025-05-07T19:44:26.9876494Z 2025-05-07T19:44:27.0206587Z gxx_impl_linux-64-11 | 11.2 MB | ########## | 100%  2025-05-07T19:44:27.0207448Z 2025-05-07T19:44:27.0207464Z 2025-05-07T19:44:27.0503576Z libstdcxx-devel_linu | 11.1 MB | ########## | 100%  2025-05-07T19:44:27.0504368Z gcc_impl_linux-64-11 | 53.0 MB | ########## | 100% 2025-05-07T19:44:27.5770878Z gcc_impl_linux-64-11 | 53.0 MB | ########## | 100% 2025-05-07T19:44:27.5773142Z gcc_impl_linux-64-11 | 53.0 MB | ########## | 100% 2025-05-07T19:44:27.5774070Z 2025-05-07T19:44:27.5774299Z 2025-05-07T19:44:27.5774654Z  2025-05-07T19:44:27.5775073Z 2025-05-07T19:44:27.5775078Z 2025-05-07T19:44:27.5775380Z  2025-05-07T19:44:27.5775666Z 2025-05-07T19:44:27.5775670Z 2025-05-07T19:44:27.5775674Z 2025-05-07T19:44:27.5775866Z  2025-05-07T19:44:27.5776102Z 2025-05-07T19:44:27.5776105Z 2025-05-07T19:44:27.5776109Z 2025-05-07T19:44:27.5776114Z 2025-05-07T19:44:27.5776352Z  2025-05-07T19:44:27.5776586Z 2025-05-07T19:44:27.5776589Z 2025-05-07T19:44:27.5776594Z 2025-05-07T19:44:27.5776597Z 2025-05-07T19:44:27.5776601Z 2025-05-07T19:44:27.5776824Z  2025-05-07T19:44:27.5777062Z 2025-05-07T19:44:27.5777066Z 2025-05-07T19:44:27.5777069Z 2025-05-07T19:44:27.5777073Z 2025-05-07T19:44:27.5777077Z 2025-05-07T19:44:27.5777086Z 2025-05-07T19:44:27.5777284Z  2025-05-07T19:44:27.5777550Z 2025-05-07T19:44:27.5777554Z 2025-05-07T19:44:27.5777557Z 2025-05-07T19:44:27.5777561Z 2025-05-07T19:44:27.5777564Z 2025-05-07T19:44:27.5777568Z 2025-05-07T19:44:27.5777571Z 2025-05-07T19:44:27.5777770Z  2025-05-07T19:44:27.5778039Z 2025-05-07T19:44:27.5778043Z 2025-05-07T19:44:27.5778046Z 2025-05-07T19:44:27.5778050Z 2025-05-07T19:44:27.5778053Z 2025-05-07T19:44:27.5778062Z 2025-05-07T19:44:27.5778065Z 2025-05-07T19:44:27.5778069Z 2025-05-07T19:44:27.5778287Z  2025-05-07T19:44:27.5778555Z 2025-05-07T19:44:27.5778559Z 2025-05-07T19:44:27.5778562Z 2025-05-07T19:44:27.5778566Z 2025-05-07T19:44:27.5778569Z 2025-05-07T19:44:27.5778573Z 2025-05-07T19:44:27.5778576Z 2025-05-07T19:44:27.5778580Z 2025-05-07T19:44:27.5778843Z 2025-05-07T19:44:27.5779059Z  2025-05-07T19:44:27.5779338Z 2025-05-07T19:44:27.5779342Z 2025-05-07T19:44:27.5779345Z 2025-05-07T19:44:27.5779349Z 2025-05-07T19:44:27.5779352Z 2025-05-07T19:44:27.5779356Z 2025-05-07T19:44:27.5779359Z 2025-05-07T19:44:27.5779362Z 2025-05-07T19:44:27.5779366Z 2025-05-07T19:44:27.5779369Z 2025-05-07T19:44:27.5779575Z  2025-05-07T19:44:27.5779855Z 2025-05-07T19:44:27.5779859Z 2025-05-07T19:44:27.5779978Z 2025-05-07T19:44:27.5779983Z 2025-05-07T19:44:27.5779986Z 2025-05-07T19:44:27.5779990Z 2025-05-07T19:44:27.5779993Z 2025-05-07T19:44:27.5779996Z 2025-05-07T19:44:27.5780000Z 2025-05-07T19:44:27.5780003Z 2025-05-07T19:44:27.5780007Z 2025-05-07T19:44:27.5780233Z  done 2025-05-07T19:44:27.6784644Z Preparing transaction: \ done 2025-05-07T19:44:27.8796627Z Verifying transaction: / - done 2025-05-07T19:44:27.9810710Z Executing transaction: | done 2025-05-07T19:44:28.0688343Z [INSTALL] Setting the C/C++ compiler symlinks ... 2025-05-07T19:44:31.7778517Z + ln -sf /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-cc /github/home/miniconda/envs/build_binary/bin/cc 2025-05-07T19:44:31.7780303Z 2025-05-07T19:44:31.7801777Z 2025-05-07T19:44:31.7822401Z + ln -sf /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-cc /github/home/miniconda/envs/build_binary/bin/gcc 2025-05-07T19:44:31.7823057Z 2025-05-07T19:44:31.7840198Z 2025-05-07T19:44:31.7858196Z + ln -sf /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ /github/home/miniconda/envs/build_binary/bin/c++ 2025-05-07T19:44:31.7858851Z 2025-05-07T19:44:31.7876286Z 2025-05-07T19:44:31.7897980Z + ln -sf /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ /github/home/miniconda/envs/build_binary/bin/g++ 2025-05-07T19:44:31.7898632Z 2025-05-07T19:44:31.7914102Z 2025-05-07T19:44:31.7920401Z [INSTALL] Installing Clang (16.0.6, 64) and relevant libraries through Conda ... 2025-05-07T19:44:31.7946473Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y clangxx=16.0.6 libcxx llvm-openmp=16.0.6 compiler-rt=16.0.6 2025-05-07T19:44:32.5070728Z Channels: 2025-05-07T19:44:32.5071095Z - conda-forge 2025-05-07T19:44:32.5071406Z Platform: linux-64 2025-05-07T19:44:35.6488234Z Collecting package metadata (repodata.json): - \ | / done 2025-05-07T19:44:37.0000829Z Solving environment: \ | / done 2025-05-07T19:44:37.0577866Z 2025-05-07T19:44:37.0578390Z ## Package Plan ## 2025-05-07T19:44:37.0578620Z 2025-05-07T19:44:37.0578888Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:44:37.0579230Z 2025-05-07T19:44:37.0579342Z added / updated specs: 2025-05-07T19:44:37.0579688Z - clangxx=16.0.6 2025-05-07T19:44:37.0580047Z - compiler-rt=16.0.6 2025-05-07T19:44:37.0580314Z - libcxx 2025-05-07T19:44:37.0580587Z - llvm-openmp=16.0.6 2025-05-07T19:44:37.0580760Z 2025-05-07T19:44:37.0580764Z 2025-05-07T19:44:37.0580898Z The following packages will be downloaded: 2025-05-07T19:44:37.0581162Z 2025-05-07T19:44:37.0581294Z package | build 2025-05-07T19:44:37.0581655Z ---------------------------|----------------- 2025-05-07T19:44:37.0582103Z clang-16.0.6 |default_h9e3a008_14 110 KB conda-forge 2025-05-07T19:44:37.0582643Z clang-16-16.0.6 |default_hb5137d0_14 780 KB conda-forge 2025-05-07T19:44:37.0583137Z clangxx-16.0.6 |default_ha78316a_14 110 KB conda-forge 2025-05-07T19:44:37.0583657Z compiler-rt-16.0.6 | h00ab1b0_2 107 KB conda-forge 2025-05-07T19:44:37.0584165Z compiler-rt_linux-64-16.0.6| h00ab1b0_2 36.0 MB conda-forge 2025-05-07T19:44:37.0585044Z icu-73.2 | h59595ed_0 11.5 MB conda-forge 2025-05-07T19:44:37.0585527Z libclang-cpp16-16.0.6 |default_hb5137d0_14 17.3 MB conda-forge 2025-05-07T19:44:37.0586061Z libcxx-19.1.7 | h2713693_1 1000 KB conda-forge 2025-05-07T19:44:37.0586548Z libcxxabi-19.1.7 | hd85fd95_1 158 KB conda-forge 2025-05-07T19:44:37.0587018Z libiconv-1.18 | h4ce23a2_1 696 KB conda-forge 2025-05-07T19:44:37.0587655Z libllvm16-16.0.6 | hb3ce162_3 33.7 MB conda-forge 2025-05-07T19:44:37.0588134Z libxml2-2.12.7 | hc051c1a_1 688 KB conda-forge 2025-05-07T19:44:37.0588621Z libzlib-1.2.13 | h4ab18f5_6 60 KB conda-forge 2025-05-07T19:44:37.0589093Z llvm-openmp-16.0.6 | h4dfa4b3_0 39.9 MB conda-forge 2025-05-07T19:44:37.0589584Z zlib-1.2.13 | h4ab18f5_6 91 KB conda-forge 2025-05-07T19:44:37.0590054Z zstd-1.5.6 | ha6fb4c9_0 542 KB conda-forge 2025-05-07T19:44:37.0590480Z ------------------------------------------------------------ 2025-05-07T19:44:37.0590890Z Total: 142.6 MB 2025-05-07T19:44:37.0591124Z 2025-05-07T19:44:37.0591271Z The following NEW packages will be INSTALLED: 2025-05-07T19:44:37.0591545Z 2025-05-07T19:44:37.0591805Z clang conda-forge/linux-64::clang-16.0.6-default_h9e3a008_14 2025-05-07T19:44:37.0592330Z clang-16 conda-forge/linux-64::clang-16-16.0.6-default_hb5137d0_14 2025-05-07T19:44:37.0592849Z clangxx conda-forge/linux-64::clangxx-16.0.6-default_ha78316a_14 2025-05-07T19:44:37.0593377Z compiler-rt conda-forge/linux-64::compiler-rt-16.0.6-h00ab1b0_2 2025-05-07T19:44:37.0593936Z compiler-rt_linux~ conda-forge/noarch::compiler-rt_linux-64-16.0.6-h00ab1b0_2 2025-05-07T19:44:37.0594453Z icu conda-forge/linux-64::icu-73.2-h59595ed_0 2025-05-07T19:44:37.0594980Z libclang-cpp16 conda-forge/linux-64::libclang-cpp16-16.0.6-default_hb5137d0_14 2025-05-07T19:44:37.0595517Z libcxx conda-forge/linux-64::libcxx-19.1.7-h2713693_1 2025-05-07T19:44:37.0595998Z libcxxabi conda-forge/linux-64::libcxxabi-19.1.7-hd85fd95_1 2025-05-07T19:44:37.0596470Z libiconv conda-forge/linux-64::libiconv-1.18-h4ce23a2_1 2025-05-07T19:44:37.0596955Z libllvm16 conda-forge/linux-64::libllvm16-16.0.6-hb3ce162_3 2025-05-07T19:44:37.0599056Z libxml2 conda-forge/linux-64::libxml2-2.12.7-hc051c1a_1 2025-05-07T19:44:37.0599518Z libzlib conda-forge/linux-64::libzlib-1.2.13-h4ab18f5_6 2025-05-07T19:44:37.0600027Z llvm-openmp conda-forge/linux-64::llvm-openmp-16.0.6-h4dfa4b3_0 2025-05-07T19:44:37.0600492Z zstd conda-forge/linux-64::zstd-1.5.6-ha6fb4c9_0 2025-05-07T19:44:37.0600770Z 2025-05-07T19:44:37.0600899Z The following packages will be UPDATED: 2025-05-07T19:44:37.0601119Z 2025-05-07T19:44:37.0601413Z zlib pkgs/main::zlib-1.2.13-h5eee18b_1 --> conda-forge::zlib-1.2.13-h4ab18f5_6 2025-05-07T19:44:37.0601786Z 2025-05-07T19:44:37.0601790Z 2025-05-07T19:44:37.0601793Z 2025-05-07T19:44:37.0601955Z Downloading and Extracting Packages: ...working... 2025-05-07T19:44:37.0602413Z llvm-openmp-16.0.6 | 39.9 MB | | 0% 2025-05-07T19:44:37.0602691Z 2025-05-07T19:44:37.0603162Z compiler-rt_linux-64 | 36.0 MB | | 0%  2025-05-07T19:44:37.0603438Z 2025-05-07T19:44:37.0603452Z 2025-05-07T19:44:37.0603685Z libllvm16-16.0.6 | 33.7 MB | | 0%  2025-05-07T19:44:37.0603997Z 2025-05-07T19:44:37.0604000Z 2025-05-07T19:44:37.0604004Z 2025-05-07T19:44:37.0617427Z libclang-cpp16-16.0. | 17.3 MB | | 0%  2025-05-07T19:44:37.0617734Z 2025-05-07T19:44:37.0617738Z 2025-05-07T19:44:37.0617742Z 2025-05-07T19:44:37.0617868Z 2025-05-07T19:44:37.0624780Z icu-73.2 | 11.5 MB | | 0%  2025-05-07T19:44:37.0625060Z 2025-05-07T19:44:37.0625065Z 2025-05-07T19:44:37.0625070Z 2025-05-07T19:44:37.0625074Z 2025-05-07T19:44:37.0625081Z 2025-05-07T19:44:37.0628519Z libcxx-19.1.7 | 1000 KB | | 0%  2025-05-07T19:44:37.0628815Z 2025-05-07T19:44:37.0628819Z 2025-05-07T19:44:37.0628822Z 2025-05-07T19:44:37.0628825Z 2025-05-07T19:44:37.0628829Z 2025-05-07T19:44:37.0628832Z 2025-05-07T19:44:37.0632078Z clang-16-16.0.6 | 780 KB | | 0%  2025-05-07T19:44:37.0632400Z 2025-05-07T19:44:37.0632404Z 2025-05-07T19:44:37.0632407Z 2025-05-07T19:44:37.0632411Z 2025-05-07T19:44:37.0632414Z 2025-05-07T19:44:37.0632417Z 2025-05-07T19:44:37.0632424Z 2025-05-07T19:44:37.0632746Z libiconv-1.18 | 696 KB | | 0%  2025-05-07T19:44:37.0633068Z 2025-05-07T19:44:37.0633072Z 2025-05-07T19:44:37.0633082Z 2025-05-07T19:44:37.0633095Z 2025-05-07T19:44:37.0633098Z 2025-05-07T19:44:37.0633102Z 2025-05-07T19:44:37.0633105Z 2025-05-07T19:44:37.0633108Z 2025-05-07T19:44:37.0636425Z libxml2-2.12.7 | 688 KB | | 0%  2025-05-07T19:44:37.0636751Z 2025-05-07T19:44:37.0636754Z 2025-05-07T19:44:37.0636758Z 2025-05-07T19:44:37.0636761Z 2025-05-07T19:44:37.0636765Z 2025-05-07T19:44:37.0636768Z 2025-05-07T19:44:37.0636771Z 2025-05-07T19:44:37.0636774Z 2025-05-07T19:44:37.0639334Z 2025-05-07T19:44:37.0639633Z zstd-1.5.6 | 542 KB | | 0%  2025-05-07T19:44:37.0639953Z 2025-05-07T19:44:37.0639970Z 2025-05-07T19:44:37.0639973Z 2025-05-07T19:44:37.0639977Z 2025-05-07T19:44:37.0639980Z 2025-05-07T19:44:37.0639983Z 2025-05-07T19:44:37.0639987Z 2025-05-07T19:44:37.0639990Z 2025-05-07T19:44:37.0639993Z 2025-05-07T19:44:37.0639997Z 2025-05-07T19:44:37.0643084Z libcxxabi-19.1.7 | 158 KB | | 0%  2025-05-07T19:44:37.0643405Z 2025-05-07T19:44:37.0643408Z 2025-05-07T19:44:37.0643412Z 2025-05-07T19:44:37.0643415Z 2025-05-07T19:44:37.0643419Z 2025-05-07T19:44:37.0643422Z 2025-05-07T19:44:37.0643426Z 2025-05-07T19:44:37.0643429Z 2025-05-07T19:44:37.0643433Z 2025-05-07T19:44:37.0643436Z 2025-05-07T19:44:37.0643439Z 2025-05-07T19:44:37.0644031Z clang-16.0.6 | 110 KB | | 0%  2025-05-07T19:44:37.0644323Z 2025-05-07T19:44:37.0644332Z 2025-05-07T19:44:37.0644336Z 2025-05-07T19:44:37.0644339Z 2025-05-07T19:44:37.0644347Z 2025-05-07T19:44:37.0644350Z 2025-05-07T19:44:37.0644354Z 2025-05-07T19:44:37.0644357Z 2025-05-07T19:44:37.0644361Z 2025-05-07T19:44:37.0644393Z 2025-05-07T19:44:37.0644397Z 2025-05-07T19:44:37.0644400Z 2025-05-07T19:44:37.0646407Z clangxx-16.0.6 | 110 KB | | 0%  2025-05-07T19:44:37.0646714Z 2025-05-07T19:44:37.0646725Z 2025-05-07T19:44:37.0646733Z 2025-05-07T19:44:37.0646737Z 2025-05-07T19:44:37.0646770Z 2025-05-07T19:44:37.0646773Z 2025-05-07T19:44:37.0646776Z 2025-05-07T19:44:37.0646780Z 2025-05-07T19:44:37.0646783Z 2025-05-07T19:44:37.0646787Z 2025-05-07T19:44:37.0646790Z 2025-05-07T19:44:37.0646793Z 2025-05-07T19:44:37.0646796Z 2025-05-07T19:44:37.0649976Z compiler-rt-16.0.6 | 107 KB | | 0%  2025-05-07T19:44:37.0650336Z 2025-05-07T19:44:37.0650339Z 2025-05-07T19:44:37.0650342Z 2025-05-07T19:44:37.0650346Z 2025-05-07T19:44:37.0650349Z 2025-05-07T19:44:37.0650362Z 2025-05-07T19:44:37.0650366Z 2025-05-07T19:44:37.0650369Z 2025-05-07T19:44:37.0650372Z 2025-05-07T19:44:37.0650376Z 2025-05-07T19:44:37.0650379Z 2025-05-07T19:44:37.0650382Z 2025-05-07T19:44:37.0650386Z 2025-05-07T19:44:37.0650389Z 2025-05-07T19:44:37.0651066Z zlib-1.2.13 | 91 KB | | 0%  2025-05-07T19:44:37.0651399Z 2025-05-07T19:44:37.0651471Z 2025-05-07T19:44:37.0651475Z 2025-05-07T19:44:37.0651478Z 2025-05-07T19:44:37.0651481Z 2025-05-07T19:44:37.0651485Z 2025-05-07T19:44:37.0651488Z 2025-05-07T19:44:37.0651491Z 2025-05-07T19:44:37.0651495Z 2025-05-07T19:44:37.0651498Z 2025-05-07T19:44:37.0651501Z 2025-05-07T19:44:37.0651504Z 2025-05-07T19:44:37.0651508Z 2025-05-07T19:44:37.0651511Z 2025-05-07T19:44:37.0651514Z 2025-05-07T19:44:37.2177977Z libzlib-1.2.13 | 60 KB | | 0%  2025-05-07T19:44:37.2178343Z 2025-05-07T19:44:37.2230404Z 2025-05-07T19:44:37.2610805Z libllvm16-16.0.6 | 33.7 MB | | 0%  2025-05-07T19:44:37.2611248Z 2025-05-07T19:44:37.2611431Z 2025-05-07T19:44:37.2716667Z 2025-05-07T19:44:37.3182604Z libclang-cpp16-16.0. | 17.3 MB | | 0%  2025-05-07T19:44:37.3182970Z 2025-05-07T19:44:37.3182983Z 2025-05-07T19:44:37.3540390Z libllvm16-16.0.6 | 33.7 MB | 2 | 3%  2025-05-07T19:44:37.3583739Z llvm-openmp-16.0.6 | 39.9 MB | | 0% 2025-05-07T19:44:37.3584053Z 2025-05-07T19:44:37.3584058Z 2025-05-07T19:44:37.3584061Z 2025-05-07T19:44:37.3584065Z 2025-05-07T19:44:37.3610149Z icu-73.2 | 11.5 MB | | 0%  2025-05-07T19:44:37.3610642Z 2025-05-07T19:44:37.3610816Z 2025-05-07T19:44:37.3610821Z 2025-05-07T19:44:37.3766661Z libclang-cpp16-16.0. | 17.3 MB | #9 | 20%  2025-05-07T19:44:37.3767539Z 2025-05-07T19:44:37.4184822Z compiler-rt_linux-64 | 36.0 MB | | 0%  2025-05-07T19:44:37.4185163Z 2025-05-07T19:44:37.4185169Z 2025-05-07T19:44:37.4545810Z libllvm16-16.0.6 | 33.7 MB | #4 | 14%  2025-05-07T19:44:37.4584207Z llvm-openmp-16.0.6 | 39.9 MB | 8 | 9% 2025-05-07T19:44:37.4584509Z 2025-05-07T19:44:37.4584541Z 2025-05-07T19:44:37.4584551Z 2025-05-07T19:44:37.4584556Z 2025-05-07T19:44:37.4610364Z icu-73.2 | 11.5 MB | ###### | 60%  2025-05-07T19:44:37.4610679Z 2025-05-07T19:44:37.4610802Z 2025-05-07T19:44:37.4610806Z 2025-05-07T19:44:37.4767682Z libclang-cpp16-16.0. | 17.3 MB | #####2 | 53%  2025-05-07T19:44:37.4768007Z 2025-05-07T19:44:37.5185070Z compiler-rt_linux-64 | 36.0 MB | #7 | 17%  2025-05-07T19:44:37.5185390Z 2025-05-07T19:44:37.5185398Z 2025-05-07T19:44:37.5544015Z libllvm16-16.0.6 | 33.7 MB | ##6 | 27%  2025-05-07T19:44:37.5613164Z llvm-openmp-16.0.6 | 39.9 MB | ##2 | 22% 2025-05-07T19:44:37.5613661Z 2025-05-07T19:44:37.5613726Z 2025-05-07T19:44:37.5613756Z 2025-05-07T19:44:37.5794907Z libclang-cpp16-16.0. | 17.3 MB | ########4 | 85%  2025-05-07T19:44:37.5795266Z 2025-05-07T19:44:37.6186987Z compiler-rt_linux-64 | 36.0 MB | ###2 | 32%  2025-05-07T19:44:37.6187350Z 2025-05-07T19:44:37.6187356Z 2025-05-07T19:44:37.6289493Z libllvm16-16.0.6 | 33.7 MB | ###5 | 36%  2025-05-07T19:44:37.6289887Z 2025-05-07T19:44:37.6289967Z 2025-05-07T19:44:37.6289979Z 2025-05-07T19:44:37.6290135Z 2025-05-07T19:44:37.6290628Z icu-73.2 | 11.5 MB | ########## | 100%  2025-05-07T19:44:37.6290914Z 2025-05-07T19:44:37.6290918Z 2025-05-07T19:44:37.6290922Z 2025-05-07T19:44:37.6290926Z 2025-05-07T19:44:37.6545689Z icu-73.2 | 11.5 MB | ########## | 100%  2025-05-07T19:44:37.6711270Z llvm-openmp-16.0.6 | 39.9 MB | #####7 | 58% 2025-05-07T19:44:37.6711759Z 2025-05-07T19:44:37.6711776Z 2025-05-07T19:44:37.6711782Z 2025-05-07T19:44:37.6711788Z 2025-05-07T19:44:37.6711825Z 2025-05-07T19:44:37.6796616Z libcxx-19.1.7 | 1000 KB | 1 | 2%  2025-05-07T19:44:37.6796923Z 2025-05-07T19:44:37.6977642Z compiler-rt_linux-64 | 36.0 MB | ##### | 51%  2025-05-07T19:44:37.6977998Z 2025-05-07T19:44:37.6978004Z 2025-05-07T19:44:37.6978010Z 2025-05-07T19:44:37.6978016Z 2025-05-07T19:44:37.6979615Z 2025-05-07T19:44:37.7433446Z libcxx-19.1.7 | 1000 KB | ########## | 100%  2025-05-07T19:44:37.7433789Z 2025-05-07T19:44:37.7433794Z 2025-05-07T19:44:37.7433798Z 2025-05-07T19:44:37.7433803Z 2025-05-07T19:44:37.7433808Z 2025-05-07T19:44:37.7433812Z 2025-05-07T19:44:37.7546122Z clang-16-16.0.6 | 780 KB | 2 | 2%  2025-05-07T19:44:37.7564198Z llvm-openmp-16.0.6 | 39.9 MB | ########1 | 81% 2025-05-07T19:44:37.7564555Z 2025-05-07T19:44:37.7564658Z 2025-05-07T19:44:37.7564668Z 2025-05-07T19:44:37.7704946Z libclang-cpp16-16.0. | 17.3 MB | ########## | 100%  2025-05-07T19:44:37.7705275Z 2025-05-07T19:44:37.7705280Z 2025-05-07T19:44:37.7705284Z 2025-05-07T19:44:37.7705289Z 2025-05-07T19:44:37.7705294Z 2025-05-07T19:44:37.7705329Z 2025-05-07T19:44:37.7969260Z clang-16-16.0.6 | 780 KB | ########## | 100%  2025-05-07T19:44:37.7969749Z 2025-05-07T19:44:37.7969801Z 2025-05-07T19:44:37.7969810Z 2025-05-07T19:44:37.7969817Z 2025-05-07T19:44:37.7969853Z 2025-05-07T19:44:37.7969857Z 2025-05-07T19:44:37.7969861Z 2025-05-07T19:44:37.7974222Z libiconv-1.18 | 696 KB | 2 | 2%  2025-05-07T19:44:37.7974537Z 2025-05-07T19:44:37.7974542Z 2025-05-07T19:44:37.7974546Z 2025-05-07T19:44:37.7974549Z 2025-05-07T19:44:37.7974564Z 2025-05-07T19:44:37.7977788Z libcxx-19.1.7 | 1000 KB | ########## | 100%  2025-05-07T19:44:37.7978100Z 2025-05-07T19:44:37.7978104Z 2025-05-07T19:44:37.7978108Z 2025-05-07T19:44:37.7978111Z 2025-05-07T19:44:37.7978122Z 2025-05-07T19:44:37.8131220Z libcxx-19.1.7 | 1000 KB | ########## | 100%  2025-05-07T19:44:37.8131557Z 2025-05-07T19:44:37.8148736Z compiler-rt_linux-64 | 36.0 MB | ######5 | 65%  2025-05-07T19:44:37.8149044Z 2025-05-07T19:44:37.8149048Z 2025-05-07T19:44:37.8170355Z libllvm16-16.0.6 | 33.7 MB | ####4 | 44%  2025-05-07T19:44:37.8170691Z 2025-05-07T19:44:37.8170695Z 2025-05-07T19:44:37.8170716Z 2025-05-07T19:44:37.8170720Z 2025-05-07T19:44:37.8170724Z 2025-05-07T19:44:37.8170727Z 2025-05-07T19:44:37.8170731Z 2025-05-07T19:44:37.8170734Z 2025-05-07T19:44:37.8376992Z libxml2-2.12.7 | 688 KB | 2 | 2%  2025-05-07T19:44:37.8377360Z 2025-05-07T19:44:37.8377365Z 2025-05-07T19:44:37.8377369Z 2025-05-07T19:44:37.8377372Z 2025-05-07T19:44:37.8377376Z 2025-05-07T19:44:37.8377379Z 2025-05-07T19:44:37.8378707Z 2025-05-07T19:44:37.8578978Z libiconv-1.18 | 696 KB | ########## | 100%  2025-05-07T19:44:37.8579316Z 2025-05-07T19:44:37.8579321Z 2025-05-07T19:44:37.8579325Z 2025-05-07T19:44:37.8579328Z 2025-05-07T19:44:37.8579332Z 2025-05-07T19:44:37.8579335Z 2025-05-07T19:44:37.8579338Z 2025-05-07T19:44:37.8579342Z 2025-05-07T19:44:37.8855976Z libxml2-2.12.7 | 688 KB | ########## | 100%  2025-05-07T19:44:37.8856321Z 2025-05-07T19:44:37.8856327Z 2025-05-07T19:44:37.8856351Z 2025-05-07T19:44:37.8856355Z 2025-05-07T19:44:37.8856359Z 2025-05-07T19:44:37.8856362Z 2025-05-07T19:44:37.8856366Z 2025-05-07T19:44:37.8856369Z 2025-05-07T19:44:37.8856480Z 2025-05-07T19:44:37.9025206Z zstd-1.5.6 | 542 KB | 2 | 3%  2025-05-07T19:44:37.9025525Z 2025-05-07T19:44:37.9025530Z 2025-05-07T19:44:37.9025534Z 2025-05-07T19:44:37.9025538Z 2025-05-07T19:44:37.9025541Z 2025-05-07T19:44:37.9025545Z 2025-05-07T19:44:37.9025548Z 2025-05-07T19:44:37.9025553Z 2025-05-07T19:44:37.9025577Z 2025-05-07T19:44:37.9025580Z 2025-05-07T19:44:37.9131579Z libcxxabi-19.1.7 | 158 KB | # | 10%  2025-05-07T19:44:37.9132551Z 2025-05-07T19:44:37.9132565Z 2025-05-07T19:44:37.9132577Z 2025-05-07T19:44:37.9132589Z 2025-05-07T19:44:37.9132599Z 2025-05-07T19:44:37.9132609Z 2025-05-07T19:44:37.9132650Z 2025-05-07T19:44:37.9132661Z 2025-05-07T19:44:37.9132671Z 2025-05-07T19:44:37.9132681Z 2025-05-07T19:44:37.9158313Z libcxxabi-19.1.7 | 158 KB | ########## | 100%  2025-05-07T19:44:37.9158650Z 2025-05-07T19:44:37.9160811Z 2025-05-07T19:44:37.9194126Z libllvm16-16.0.6 | 33.7 MB | #####5 | 55%  2025-05-07T19:44:37.9194528Z 2025-05-07T19:44:37.9194754Z 2025-05-07T19:44:37.9194762Z 2025-05-07T19:44:37.9194775Z 2025-05-07T19:44:37.9194779Z 2025-05-07T19:44:37.9194783Z 2025-05-07T19:44:37.9194787Z 2025-05-07T19:44:37.9194931Z 2025-05-07T19:44:37.9194939Z 2025-05-07T19:44:37.9383882Z zstd-1.5.6 | 542 KB | ########## | 100%  2025-05-07T19:44:37.9384237Z 2025-05-07T19:44:37.9653104Z compiler-rt_linux-64 | 36.0 MB | #######8 | 79%  2025-05-07T19:44:37.9653409Z 2025-05-07T19:44:37.9653625Z 2025-05-07T19:44:37.9653633Z 2025-05-07T19:44:37.9653638Z 2025-05-07T19:44:37.9653643Z 2025-05-07T19:44:37.9653648Z 2025-05-07T19:44:37.9653652Z 2025-05-07T19:44:37.9653656Z 2025-05-07T19:44:37.9653669Z 2025-05-07T19:44:37.9653696Z 2025-05-07T19:44:37.9653774Z 2025-05-07T19:44:37.9677578Z clang-16.0.6 | 110 KB | #4 | 15%  2025-05-07T19:44:37.9678544Z 2025-05-07T19:44:37.9678558Z 2025-05-07T19:44:37.9678573Z 2025-05-07T19:44:37.9678585Z 2025-05-07T19:44:37.9678596Z 2025-05-07T19:44:37.9678606Z 2025-05-07T19:44:37.9678616Z 2025-05-07T19:44:37.9678626Z 2025-05-07T19:44:37.9678637Z 2025-05-07T19:44:37.9678647Z 2025-05-07T19:44:37.9678658Z 2025-05-07T19:44:37.9678669Z 2025-05-07T19:44:37.9722439Z clangxx-16.0.6 | 110 KB | #4 | 15%  2025-05-07T19:44:37.9722784Z 2025-05-07T19:44:37.9722789Z 2025-05-07T19:44:37.9722793Z 2025-05-07T19:44:37.9722797Z 2025-05-07T19:44:37.9722800Z 2025-05-07T19:44:37.9722804Z 2025-05-07T19:44:37.9722807Z 2025-05-07T19:44:37.9722811Z 2025-05-07T19:44:37.9722814Z 2025-05-07T19:44:37.9722818Z 2025-05-07T19:44:37.9722822Z 2025-05-07T19:44:37.9737225Z clang-16.0.6 | 110 KB | ########## | 100%  2025-05-07T19:44:37.9737539Z 2025-05-07T19:44:37.9737542Z 2025-05-07T19:44:37.9737546Z 2025-05-07T19:44:37.9737549Z 2025-05-07T19:44:37.9737553Z 2025-05-07T19:44:37.9737564Z 2025-05-07T19:44:37.9737568Z 2025-05-07T19:44:37.9737571Z 2025-05-07T19:44:37.9737574Z 2025-05-07T19:44:37.9737578Z 2025-05-07T19:44:37.9737602Z 2025-05-07T19:44:37.9737605Z 2025-05-07T19:44:38.0158845Z clangxx-16.0.6 | 110 KB | ########## | 100%  2025-05-07T19:44:38.0159184Z 2025-05-07T19:44:38.0159212Z 2025-05-07T19:44:38.0175171Z libllvm16-16.0.6 | 33.7 MB | #######1 | 71%  2025-05-07T19:44:38.0175764Z 2025-05-07T19:44:38.0175770Z 2025-05-07T19:44:38.0175775Z 2025-05-07T19:44:38.0175781Z 2025-05-07T19:44:38.0175785Z 2025-05-07T19:44:38.0175793Z 2025-05-07T19:44:38.0175797Z 2025-05-07T19:44:38.0175877Z 2025-05-07T19:44:38.0175881Z 2025-05-07T19:44:38.0175886Z 2025-05-07T19:44:38.0175891Z 2025-05-07T19:44:38.0175914Z 2025-05-07T19:44:38.0175919Z 2025-05-07T19:44:38.0175924Z 2025-05-07T19:44:38.0225590Z zlib-1.2.13 | 91 KB | #7 | 18%  2025-05-07T19:44:38.0225914Z 2025-05-07T19:44:38.0225940Z 2025-05-07T19:44:38.0225944Z 2025-05-07T19:44:38.0225948Z 2025-05-07T19:44:38.0225951Z 2025-05-07T19:44:38.0225955Z 2025-05-07T19:44:38.0225959Z 2025-05-07T19:44:38.0225962Z 2025-05-07T19:44:38.0225966Z 2025-05-07T19:44:38.0225969Z 2025-05-07T19:44:38.0225973Z 2025-05-07T19:44:38.0225976Z 2025-05-07T19:44:38.0225980Z 2025-05-07T19:44:38.0226005Z 2025-05-07T19:44:38.0262835Z zlib-1.2.13 | 91 KB | ########## | 100%  2025-05-07T19:44:38.0263196Z 2025-05-07T19:44:38.0263217Z 2025-05-07T19:44:38.0263221Z 2025-05-07T19:44:38.0263225Z 2025-05-07T19:44:38.0263228Z 2025-05-07T19:44:38.0263232Z 2025-05-07T19:44:38.0263235Z 2025-05-07T19:44:38.0263239Z 2025-05-07T19:44:38.0263475Z 2025-05-07T19:44:38.0263479Z 2025-05-07T19:44:38.0263482Z 2025-05-07T19:44:38.0263486Z 2025-05-07T19:44:38.0263489Z 2025-05-07T19:44:38.0312116Z compiler-rt-16.0.6 | 107 KB | #4 | 15%  2025-05-07T19:44:38.0313245Z 2025-05-07T19:44:38.0313270Z 2025-05-07T19:44:38.0313479Z 2025-05-07T19:44:38.0313500Z 2025-05-07T19:44:38.0313516Z 2025-05-07T19:44:38.0313530Z 2025-05-07T19:44:38.0313545Z 2025-05-07T19:44:38.0313560Z 2025-05-07T19:44:38.0313573Z 2025-05-07T19:44:38.0313586Z 2025-05-07T19:44:38.0313600Z 2025-05-07T19:44:38.0313613Z 2025-05-07T19:44:38.0314046Z 2025-05-07T19:44:38.0387709Z compiler-rt-16.0.6 | 107 KB | ########## | 100%  2025-05-07T19:44:38.0388297Z 2025-05-07T19:44:38.0621224Z compiler-rt_linux-64 | 36.0 MB | #########6 | 96%  2025-05-07T19:44:38.0621563Z 2025-05-07T19:44:38.0621569Z 2025-05-07T19:44:38.0621573Z 2025-05-07T19:44:38.0621576Z 2025-05-07T19:44:38.0621580Z 2025-05-07T19:44:38.0621606Z 2025-05-07T19:44:38.0621609Z 2025-05-07T19:44:38.0621613Z 2025-05-07T19:44:38.0621616Z 2025-05-07T19:44:38.0621620Z 2025-05-07T19:44:38.0621623Z 2025-05-07T19:44:38.0621627Z 2025-05-07T19:44:38.0621631Z 2025-05-07T19:44:38.0621635Z 2025-05-07T19:44:38.0621639Z 2025-05-07T19:44:38.0645169Z libzlib-1.2.13 | 60 KB | ##6 | 27%  2025-05-07T19:44:38.0645587Z 2025-05-07T19:44:38.0645593Z 2025-05-07T19:44:38.0645597Z 2025-05-07T19:44:38.0645601Z 2025-05-07T19:44:38.0645604Z 2025-05-07T19:44:38.0645607Z 2025-05-07T19:44:38.0645628Z 2025-05-07T19:44:38.0645632Z 2025-05-07T19:44:38.0645635Z 2025-05-07T19:44:38.0645639Z 2025-05-07T19:44:38.0645642Z 2025-05-07T19:44:38.0645646Z 2025-05-07T19:44:38.0645649Z 2025-05-07T19:44:38.0645652Z 2025-05-07T19:44:38.0645656Z 2025-05-07T19:44:38.1161147Z libzlib-1.2.13 | 60 KB | ########## | 100%  2025-05-07T19:44:38.1161691Z 2025-05-07T19:44:38.1161706Z 2025-05-07T19:44:38.2236880Z libllvm16-16.0.6 | 33.7 MB | #########5 | 96%  2025-05-07T19:44:38.2237191Z 2025-05-07T19:44:38.2237322Z 2025-05-07T19:44:38.2237330Z 2025-05-07T19:44:38.2237337Z 2025-05-07T19:44:38.2237344Z 2025-05-07T19:44:38.2237485Z 2025-05-07T19:44:38.2237803Z clang-16-16.0.6 | 780 KB | ########## | 100%  2025-05-07T19:44:38.2238087Z 2025-05-07T19:44:38.2238092Z 2025-05-07T19:44:38.2238101Z 2025-05-07T19:44:38.2238106Z 2025-05-07T19:44:38.2238111Z 2025-05-07T19:44:38.2238116Z 2025-05-07T19:44:38.2879641Z clang-16-16.0.6 | 780 KB | ########## | 100%  2025-05-07T19:44:38.2879972Z 2025-05-07T19:44:38.2879978Z 2025-05-07T19:44:38.2879983Z 2025-05-07T19:44:38.2879987Z 2025-05-07T19:44:38.2879990Z 2025-05-07T19:44:38.2879994Z 2025-05-07T19:44:38.2879997Z 2025-05-07T19:44:38.2880255Z libiconv-1.18 | 696 KB | ########## | 100%  2025-05-07T19:44:38.2880584Z 2025-05-07T19:44:38.2880587Z 2025-05-07T19:44:38.2880591Z 2025-05-07T19:44:38.2880595Z 2025-05-07T19:44:38.2880598Z 2025-05-07T19:44:38.2880601Z 2025-05-07T19:44:38.2880605Z 2025-05-07T19:44:38.3304550Z libiconv-1.18 | 696 KB | ########## | 100%  2025-05-07T19:44:38.3304897Z 2025-05-07T19:44:38.3304902Z 2025-05-07T19:44:38.3304907Z 2025-05-07T19:44:38.3304910Z 2025-05-07T19:44:38.3587296Z icu-73.2 | 11.5 MB | ########## | 100%  2025-05-07T19:44:38.3587647Z 2025-05-07T19:44:38.3587652Z 2025-05-07T19:44:38.3587657Z 2025-05-07T19:44:38.3587683Z 2025-05-07T19:44:38.3587688Z 2025-05-07T19:44:38.3587692Z 2025-05-07T19:44:38.3587697Z 2025-05-07T19:44:38.3587701Z 2025-05-07T19:44:38.3589795Z libxml2-2.12.7 | 688 KB | ########## | 100%  2025-05-07T19:44:38.3590113Z 2025-05-07T19:44:38.3590117Z 2025-05-07T19:44:38.3590121Z 2025-05-07T19:44:38.3590124Z 2025-05-07T19:44:38.3590128Z 2025-05-07T19:44:38.3590354Z 2025-05-07T19:44:38.3590365Z 2025-05-07T19:44:38.3590368Z 2025-05-07T19:44:38.3685294Z libxml2-2.12.7 | 688 KB | ########## | 100%  2025-05-07T19:44:38.3685645Z 2025-05-07T19:44:38.3685650Z 2025-05-07T19:44:38.3685654Z 2025-05-07T19:44:38.3685658Z 2025-05-07T19:44:38.3685661Z 2025-05-07T19:44:38.3685664Z 2025-05-07T19:44:38.3685668Z 2025-05-07T19:44:38.3685671Z 2025-05-07T19:44:38.3685675Z 2025-05-07T19:44:38.3685678Z 2025-05-07T19:44:38.3687216Z libcxxabi-19.1.7 | 158 KB | ########## | 100%  2025-05-07T19:44:38.3687752Z 2025-05-07T19:44:38.3687758Z 2025-05-07T19:44:38.3687761Z 2025-05-07T19:44:38.3687765Z 2025-05-07T19:44:38.3687768Z 2025-05-07T19:44:38.3687771Z 2025-05-07T19:44:38.3687775Z 2025-05-07T19:44:38.3687779Z 2025-05-07T19:44:38.3687788Z 2025-05-07T19:44:38.3687791Z 2025-05-07T19:44:38.4007276Z libcxxabi-19.1.7 | 158 KB | ########## | 100%  2025-05-07T19:44:38.4070163Z llvm-openmp-16.0.6 | 39.9 MB | ########## | 100% 2025-05-07T19:44:38.4070482Z 2025-05-07T19:44:38.4070487Z 2025-05-07T19:44:38.4070513Z 2025-05-07T19:44:38.4070517Z 2025-05-07T19:44:38.4070521Z 2025-05-07T19:44:38.4070524Z 2025-05-07T19:44:38.4070528Z 2025-05-07T19:44:38.4070532Z 2025-05-07T19:44:38.4070536Z 2025-05-07T19:44:38.4073047Z zstd-1.5.6 | 542 KB | ########## | 100%  2025-05-07T19:44:38.4073334Z 2025-05-07T19:44:38.4073338Z 2025-05-07T19:44:38.4073364Z 2025-05-07T19:44:38.4073368Z 2025-05-07T19:44:38.4073371Z 2025-05-07T19:44:38.4073390Z 2025-05-07T19:44:38.4073398Z 2025-05-07T19:44:38.4073402Z 2025-05-07T19:44:38.4073405Z 2025-05-07T19:44:38.4481123Z zstd-1.5.6 | 542 KB | ########## | 100%  2025-05-07T19:44:38.4481903Z 2025-05-07T19:44:38.4720591Z compiler-rt_linux-64 | 36.0 MB | ########## | 100%  2025-05-07T19:44:38.4720948Z 2025-05-07T19:44:38.4720956Z 2025-05-07T19:44:38.4721011Z 2025-05-07T19:44:38.4721016Z 2025-05-07T19:44:38.4721021Z 2025-05-07T19:44:38.4721026Z 2025-05-07T19:44:38.4721047Z 2025-05-07T19:44:38.4721052Z 2025-05-07T19:44:38.4721058Z 2025-05-07T19:44:38.4721062Z 2025-05-07T19:44:38.4721068Z 2025-05-07T19:44:38.4721340Z clang-16.0.6 | 110 KB | ########## | 100%  2025-05-07T19:44:38.4721631Z 2025-05-07T19:44:38.4721635Z 2025-05-07T19:44:38.4721640Z 2025-05-07T19:44:38.4721645Z 2025-05-07T19:44:38.4721662Z 2025-05-07T19:44:38.4721667Z 2025-05-07T19:44:38.4721672Z 2025-05-07T19:44:38.4721678Z 2025-05-07T19:44:38.4721699Z 2025-05-07T19:44:38.4721703Z 2025-05-07T19:44:38.4721706Z 2025-05-07T19:44:38.4950402Z clang-16.0.6 | 110 KB | ########## | 100%  2025-05-07T19:44:38.4950733Z 2025-05-07T19:44:38.4950738Z 2025-05-07T19:44:38.4950742Z 2025-05-07T19:44:38.4950745Z 2025-05-07T19:44:38.4950749Z 2025-05-07T19:44:38.4950752Z 2025-05-07T19:44:38.4950755Z 2025-05-07T19:44:38.4950784Z 2025-05-07T19:44:38.4950788Z 2025-05-07T19:44:38.4950791Z 2025-05-07T19:44:38.4950795Z 2025-05-07T19:44:38.4950798Z 2025-05-07T19:44:38.4950802Z 2025-05-07T19:44:38.4950806Z 2025-05-07T19:44:38.4952238Z zlib-1.2.13 | 91 KB | ########## | 100%  2025-05-07T19:44:38.4952539Z 2025-05-07T19:44:38.4952542Z 2025-05-07T19:44:38.4952546Z 2025-05-07T19:44:38.4952549Z 2025-05-07T19:44:38.4952552Z 2025-05-07T19:44:38.4952556Z 2025-05-07T19:44:38.4952559Z 2025-05-07T19:44:38.4952562Z 2025-05-07T19:44:38.4952565Z 2025-05-07T19:44:38.4952579Z 2025-05-07T19:44:38.4952582Z 2025-05-07T19:44:38.4952593Z 2025-05-07T19:44:38.4952596Z 2025-05-07T19:44:38.4952600Z 2025-05-07T19:44:38.4956812Z zlib-1.2.13 | 91 KB | ########## | 100%  2025-05-07T19:44:38.4957089Z 2025-05-07T19:44:38.4957102Z 2025-05-07T19:44:38.5104121Z libllvm16-16.0.6 | 33.7 MB | ########## | 100%  2025-05-07T19:44:38.5104734Z 2025-05-07T19:44:38.5104739Z 2025-05-07T19:44:38.5104742Z 2025-05-07T19:44:38.5104746Z 2025-05-07T19:44:38.5104750Z 2025-05-07T19:44:38.5104753Z 2025-05-07T19:44:38.5104757Z 2025-05-07T19:44:38.5104760Z 2025-05-07T19:44:38.5104763Z 2025-05-07T19:44:38.5104767Z 2025-05-07T19:44:38.5104770Z 2025-05-07T19:44:38.5104774Z 2025-05-07T19:44:38.5105931Z clangxx-16.0.6 | 110 KB | ########## | 100%  2025-05-07T19:44:38.5106234Z 2025-05-07T19:44:38.5106238Z 2025-05-07T19:44:38.5106242Z 2025-05-07T19:44:38.5106245Z 2025-05-07T19:44:38.5106396Z 2025-05-07T19:44:38.5106400Z 2025-05-07T19:44:38.5106404Z 2025-05-07T19:44:38.5106407Z 2025-05-07T19:44:38.5106411Z 2025-05-07T19:44:38.5106416Z 2025-05-07T19:44:38.5106420Z 2025-05-07T19:44:38.5106435Z 2025-05-07T19:44:38.5263341Z clangxx-16.0.6 | 110 KB | ########## | 100%  2025-05-07T19:44:38.5263665Z 2025-05-07T19:44:38.5263670Z 2025-05-07T19:44:38.5263688Z 2025-05-07T19:44:38.5263692Z 2025-05-07T19:44:38.5263696Z 2025-05-07T19:44:38.5263700Z 2025-05-07T19:44:38.5263704Z 2025-05-07T19:44:38.5263707Z 2025-05-07T19:44:38.5263710Z 2025-05-07T19:44:38.5263730Z 2025-05-07T19:44:38.5263733Z 2025-05-07T19:44:38.5263737Z 2025-05-07T19:44:38.5263740Z 2025-05-07T19:44:38.5263744Z 2025-05-07T19:44:38.5263747Z 2025-05-07T19:44:38.5267785Z libzlib-1.2.13 | 60 KB | ########## | 100%  2025-05-07T19:44:38.5268202Z 2025-05-07T19:44:38.5268206Z 2025-05-07T19:44:38.5268223Z 2025-05-07T19:44:38.5268233Z 2025-05-07T19:44:38.5268237Z 2025-05-07T19:44:38.5268241Z 2025-05-07T19:44:38.5268244Z 2025-05-07T19:44:38.5268247Z 2025-05-07T19:44:38.5268251Z 2025-05-07T19:44:38.5268254Z 2025-05-07T19:44:38.5268257Z 2025-05-07T19:44:38.5268260Z 2025-05-07T19:44:38.5268276Z 2025-05-07T19:44:38.5268280Z 2025-05-07T19:44:38.5268283Z 2025-05-07T19:44:38.5279498Z libzlib-1.2.13 | 60 KB | ########## | 100%  2025-05-07T19:44:38.5279820Z 2025-05-07T19:44:38.5279824Z 2025-05-07T19:44:38.5279828Z 2025-05-07T19:44:38.5279831Z 2025-05-07T19:44:38.5279834Z 2025-05-07T19:44:38.5279853Z 2025-05-07T19:44:38.5279856Z 2025-05-07T19:44:38.5279860Z 2025-05-07T19:44:38.5279863Z 2025-05-07T19:44:38.5279866Z 2025-05-07T19:44:38.5279870Z 2025-05-07T19:44:38.5279873Z 2025-05-07T19:44:38.5279876Z 2025-05-07T19:44:38.5281736Z compiler-rt-16.0.6 | 107 KB | ########## | 100%  2025-05-07T19:44:38.5282052Z 2025-05-07T19:44:38.5282074Z 2025-05-07T19:44:38.5282078Z 2025-05-07T19:44:38.5282082Z 2025-05-07T19:44:38.5282085Z 2025-05-07T19:44:38.5282089Z 2025-05-07T19:44:38.5282092Z 2025-05-07T19:44:38.5282095Z 2025-05-07T19:44:38.5282099Z 2025-05-07T19:44:38.5282102Z 2025-05-07T19:44:38.5282106Z 2025-05-07T19:44:38.5282109Z 2025-05-07T19:44:38.5282130Z 2025-05-07T19:44:38.5724034Z compiler-rt-16.0.6 | 107 KB | ########## | 100%  2025-05-07T19:44:38.5724446Z 2025-05-07T19:44:38.5724452Z 2025-05-07T19:44:38.5724456Z 2025-05-07T19:44:39.0493066Z libclang-cpp16-16.0. | 17.3 MB | ########## | 100%  2025-05-07T19:44:39.0493488Z 2025-05-07T19:44:39.0956020Z compiler-rt_linux-64 | 36.0 MB | ########## | 100%  2025-05-07T19:44:39.0956484Z 2025-05-07T19:44:39.0956490Z 2025-05-07T19:44:39.1273949Z libllvm16-16.0.6 | 33.7 MB | ########## | 100%  2025-05-07T19:44:39.1282742Z llvm-openmp-16.0.6 | 39.9 MB | ########## | 100% 2025-05-07T19:44:39.1283256Z 2025-05-07T19:44:39.1283486Z 2025-05-07T19:44:39.1283773Z  2025-05-07T19:44:39.1283997Z 2025-05-07T19:44:39.1284001Z 2025-05-07T19:44:39.1284185Z  2025-05-07T19:44:39.1284436Z 2025-05-07T19:44:39.1284441Z 2025-05-07T19:44:39.1284733Z 2025-05-07T19:44:39.1284926Z  2025-05-07T19:44:39.1285158Z 2025-05-07T19:44:39.1285162Z 2025-05-07T19:44:39.1285166Z 2025-05-07T19:44:39.1285195Z 2025-05-07T19:44:39.1285385Z  2025-05-07T19:44:39.1285612Z 2025-05-07T19:44:39.1285616Z 2025-05-07T19:44:39.1285620Z 2025-05-07T19:44:39.1285623Z 2025-05-07T19:44:39.1285627Z 2025-05-07T19:44:39.1285838Z  2025-05-07T19:44:39.1286071Z 2025-05-07T19:44:39.1286204Z 2025-05-07T19:44:39.1286209Z 2025-05-07T19:44:39.1286212Z 2025-05-07T19:44:39.1286215Z 2025-05-07T19:44:39.1286219Z 2025-05-07T19:44:39.1286419Z  2025-05-07T19:44:39.1286677Z 2025-05-07T19:44:39.1286681Z 2025-05-07T19:44:39.1286684Z 2025-05-07T19:44:39.1286688Z 2025-05-07T19:44:39.1286692Z 2025-05-07T19:44:39.1286705Z 2025-05-07T19:44:39.1286709Z 2025-05-07T19:44:39.1286908Z  2025-05-07T19:44:39.1287171Z 2025-05-07T19:44:39.1287175Z 2025-05-07T19:44:39.1287179Z 2025-05-07T19:44:39.1287182Z 2025-05-07T19:44:39.1287186Z 2025-05-07T19:44:39.1287189Z 2025-05-07T19:44:39.1287193Z 2025-05-07T19:44:39.1287196Z 2025-05-07T19:44:39.1287389Z  2025-05-07T19:44:39.1287649Z 2025-05-07T19:44:39.1287653Z 2025-05-07T19:44:39.1287656Z 2025-05-07T19:44:39.1287659Z 2025-05-07T19:44:39.1287668Z 2025-05-07T19:44:39.1287672Z 2025-05-07T19:44:39.1287675Z 2025-05-07T19:44:39.1287680Z 2025-05-07T19:44:39.1287683Z 2025-05-07T19:44:39.1287889Z  2025-05-07T19:44:39.1288129Z 2025-05-07T19:44:39.1288153Z 2025-05-07T19:44:39.1288157Z 2025-05-07T19:44:39.1288161Z 2025-05-07T19:44:39.1288164Z 2025-05-07T19:44:39.1288177Z 2025-05-07T19:44:39.1288180Z 2025-05-07T19:44:39.1288184Z 2025-05-07T19:44:39.1288187Z 2025-05-07T19:44:39.1288229Z 2025-05-07T19:44:39.1288432Z  2025-05-07T19:44:39.1288698Z 2025-05-07T19:44:39.1288702Z 2025-05-07T19:44:39.1288705Z 2025-05-07T19:44:39.1288709Z 2025-05-07T19:44:39.1288712Z 2025-05-07T19:44:39.1288716Z 2025-05-07T19:44:39.1288719Z 2025-05-07T19:44:39.1288723Z 2025-05-07T19:44:39.1288726Z 2025-05-07T19:44:39.1288729Z 2025-05-07T19:44:39.1288732Z 2025-05-07T19:44:39.1288945Z  2025-05-07T19:44:39.1289214Z 2025-05-07T19:44:39.1289217Z 2025-05-07T19:44:39.1289221Z 2025-05-07T19:44:39.1289224Z 2025-05-07T19:44:39.1289228Z 2025-05-07T19:44:39.1289231Z 2025-05-07T19:44:39.1289235Z 2025-05-07T19:44:39.1289238Z 2025-05-07T19:44:39.1289241Z 2025-05-07T19:44:39.1289245Z 2025-05-07T19:44:39.1289255Z 2025-05-07T19:44:39.1289259Z 2025-05-07T19:44:39.1289494Z  2025-05-07T19:44:39.1289743Z 2025-05-07T19:44:39.1289746Z 2025-05-07T19:44:39.1289750Z 2025-05-07T19:44:39.1289753Z 2025-05-07T19:44:39.1289756Z 2025-05-07T19:44:39.1289760Z 2025-05-07T19:44:39.1289763Z 2025-05-07T19:44:39.1289766Z 2025-05-07T19:44:39.1289770Z 2025-05-07T19:44:39.1289773Z 2025-05-07T19:44:39.1289777Z 2025-05-07T19:44:39.1289780Z 2025-05-07T19:44:39.1289784Z 2025-05-07T19:44:39.1290021Z  2025-05-07T19:44:39.1290266Z 2025-05-07T19:44:39.1290270Z 2025-05-07T19:44:39.1290273Z 2025-05-07T19:44:39.1290276Z 2025-05-07T19:44:39.1290280Z 2025-05-07T19:44:39.1290283Z 2025-05-07T19:44:39.1290287Z 2025-05-07T19:44:39.1290290Z 2025-05-07T19:44:39.1290293Z 2025-05-07T19:44:39.1290297Z 2025-05-07T19:44:39.1290300Z 2025-05-07T19:44:39.1290387Z 2025-05-07T19:44:39.1290414Z 2025-05-07T19:44:39.1290417Z 2025-05-07T19:44:39.1290650Z  2025-05-07T19:44:39.1290899Z 2025-05-07T19:44:39.1290902Z 2025-05-07T19:44:39.1290906Z 2025-05-07T19:44:39.1290910Z 2025-05-07T19:44:39.1290913Z 2025-05-07T19:44:39.1290916Z 2025-05-07T19:44:39.1290920Z 2025-05-07T19:44:39.1290948Z 2025-05-07T19:44:39.1290951Z 2025-05-07T19:44:39.1290955Z 2025-05-07T19:44:39.1290958Z 2025-05-07T19:44:39.1290961Z 2025-05-07T19:44:39.1290965Z 2025-05-07T19:44:39.1290968Z 2025-05-07T19:44:39.1291073Z 2025-05-07T19:44:39.1291324Z  done 2025-05-07T19:44:39.2294425Z Preparing transaction: \ done 2025-05-07T19:44:39.3299559Z Verifying transaction: / done 2025-05-07T19:44:39.4322737Z Executing transaction: \ done 2025-05-07T19:44:39.5258690Z [INSTALL] Setting the C/C++ compiler symlinks ... 2025-05-07T19:44:43.2177371Z + ln -sf /github/home/miniconda/envs/build_binary/bin/clang /github/home/miniconda/envs/build_binary/bin/cc 2025-05-07T19:44:43.2178940Z 2025-05-07T19:44:43.2185028Z 2025-05-07T19:44:43.2205432Z + ln -sf /github/home/miniconda/envs/build_binary/bin/clang /github/home/miniconda/envs/build_binary/bin/gcc 2025-05-07T19:44:43.2207002Z 2025-05-07T19:44:43.2216465Z 2025-05-07T19:44:43.2236236Z + ln -sf /github/home/miniconda/envs/build_binary/bin/clang++ /github/home/miniconda/envs/build_binary/bin/c++ 2025-05-07T19:44:43.2237845Z 2025-05-07T19:44:43.2248786Z 2025-05-07T19:44:43.2263867Z + ln -sf /github/home/miniconda/envs/build_binary/bin/clang++ /github/home/miniconda/envs/build_binary/bin/g++ 2025-05-07T19:44:43.2264438Z 2025-05-07T19:44:43.2275565Z 2025-05-07T19:44:43.2277543Z + conda env config vars set -n build_binary CC= 2025-05-07T19:44:43.2278324Z 2025-05-07T19:44:43.6408446Z 2025-05-07T19:44:43.6409159Z + conda env config vars set -n build_binary CXX= 2025-05-07T19:44:43.6409963Z 2025-05-07T19:44:44.0611871Z 2025-05-07T19:44:44.0612866Z + conda run -n build_binary printenv CC 2025-05-07T19:44:44.0613631Z 2025-05-07T19:44:45.8499356Z /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-cc 2025-05-07T19:44:45.8500472Z 2025-05-07T19:44:45.9068272Z 2025-05-07T19:44:45.9068806Z + conda run -n build_binary printenv CXX 2025-05-07T19:44:45.9069502Z 2025-05-07T19:44:47.6794761Z /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ 2025-05-07T19:44:47.6795218Z 2025-05-07T19:44:47.7361839Z 2025-05-07T19:44:49.5932832Z [ENV] Appending to LD_LIBRARY_PATH: /github/home/miniconda/envs/build_binary/lib ... 2025-05-07T19:44:51.3911973Z ERROR conda.cli.main_run:execute(125): `conda run printenv LD_LIBRARY_PATH` failed. (See above for error) 2025-05-07T19:44:51.4506131Z + conda env config vars set -n build_binary LD_LIBRARY_PATH=/github/home/miniconda/envs/build_binary/lib 2025-05-07T19:44:51.4506621Z 2025-05-07T19:44:51.8599402Z 2025-05-07T19:44:53.6665817Z /github/home/miniconda/envs/build_binary/bin/cc 2025-05-07T19:44:53.6666126Z 2025-05-07T19:44:53.7416543Z [CHECK] Binary cc found in PATH 2025-05-07T19:44:55.5497989Z /github/home/miniconda/envs/build_binary/bin/gcc 2025-05-07T19:44:55.5498311Z 2025-05-07T19:44:55.6309723Z [CHECK] Binary gcc found in PATH 2025-05-07T19:44:57.4250336Z /github/home/miniconda/envs/build_binary/bin/c++ 2025-05-07T19:44:57.4250702Z 2025-05-07T19:44:57.4990282Z [CHECK] Binary c++ found in PATH 2025-05-07T19:44:59.2816552Z /github/home/miniconda/envs/build_binary/bin/g++ 2025-05-07T19:44:59.2817390Z 2025-05-07T19:44:59.3382987Z [CHECK] Binary g++ found in PATH 2025-05-07T19:44:59.3384578Z [INFO] Printing out all preprocessor defines in the C compiler ... 2025-05-07T19:44:59.3385055Z + conda run -n build_binary cc -dM -E - 2025-05-07T19:44:59.3385286Z 2025-05-07T19:45:01.1467604Z #define _LP64 1 2025-05-07T19:45:01.1467930Z #define __ATOMIC_ACQUIRE 2 2025-05-07T19:45:01.1468226Z #define __ATOMIC_ACQ_REL 4 2025-05-07T19:45:01.1468870Z #define __ATOMIC_CONSUME 1 2025-05-07T19:45:01.1469128Z #define __ATOMIC_RELAXED 0 2025-05-07T19:45:01.1469401Z #define __ATOMIC_RELEASE 3 2025-05-07T19:45:01.1469717Z #define __ATOMIC_SEQ_CST 5 2025-05-07T19:45:01.1470006Z #define __BIGGEST_ALIGNMENT__ 16 2025-05-07T19:45:01.1470317Z #define __BITINT_MAXWIDTH__ 8388608 2025-05-07T19:45:01.1470623Z #define __BOOL_WIDTH__ 8 2025-05-07T19:45:01.1470921Z #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__ 2025-05-07T19:45:01.1471241Z #define __CHAR16_TYPE__ unsigned short 2025-05-07T19:45:01.1471549Z #define __CHAR32_TYPE__ unsigned int 2025-05-07T19:45:01.1472021Z #define __CHAR_BIT__ 8 2025-05-07T19:45:01.1472310Z #define __CLANG_ATOMIC_BOOL_LOCK_FREE 2 2025-05-07T19:45:01.1472631Z #define __CLANG_ATOMIC_CHAR16_T_LOCK_FREE 2 2025-05-07T19:45:01.1472969Z #define __CLANG_ATOMIC_CHAR32_T_LOCK_FREE 2 2025-05-07T19:45:01.1473272Z #define __CLANG_ATOMIC_CHAR_LOCK_FREE 2 2025-05-07T19:45:01.1473602Z #define __CLANG_ATOMIC_INT_LOCK_FREE 2 2025-05-07T19:45:01.1473943Z #define __CLANG_ATOMIC_LLONG_LOCK_FREE 2 2025-05-07T19:45:01.1474268Z #define __CLANG_ATOMIC_LONG_LOCK_FREE 2 2025-05-07T19:45:01.1474596Z #define __CLANG_ATOMIC_POINTER_LOCK_FREE 2 2025-05-07T19:45:01.1475344Z #define __CLANG_ATOMIC_SHORT_LOCK_FREE 2 2025-05-07T19:45:01.1475694Z #define __CLANG_ATOMIC_WCHAR_T_LOCK_FREE 2 2025-05-07T19:45:01.1476021Z #define __CONSTANT_CFSTRINGS__ 1 2025-05-07T19:45:01.1476346Z #define __DBL_DECIMAL_DIG__ 17 2025-05-07T19:45:01.1476661Z #define __DBL_DENORM_MIN__ 4.9406564584124654e-324 2025-05-07T19:45:01.1477033Z #define __DBL_DIG__ 15 2025-05-07T19:45:01.1477331Z #define __DBL_EPSILON__ 2.2204460492503131e-16 2025-05-07T19:45:01.1477667Z #define __DBL_HAS_DENORM__ 1 2025-05-07T19:45:01.1477963Z #define __DBL_HAS_INFINITY__ 1 2025-05-07T19:45:01.1478246Z #define __DBL_HAS_QUIET_NAN__ 1 2025-05-07T19:45:01.1478549Z #define __DBL_MANT_DIG__ 53 2025-05-07T19:45:01.1478824Z #define __DBL_MAX_10_EXP__ 308 2025-05-07T19:45:01.1479128Z #define __DBL_MAX_EXP__ 1024 2025-05-07T19:45:01.1479416Z #define __DBL_MAX__ 1.7976931348623157e+308 2025-05-07T19:45:01.1479766Z #define __DBL_MIN_10_EXP__ (-307) 2025-05-07T19:45:01.1480094Z #define __DBL_MIN_EXP__ (-1021) 2025-05-07T19:45:01.1480409Z #define __DBL_MIN__ 2.2250738585072014e-308 2025-05-07T19:45:01.1480762Z #define __DECIMAL_DIG__ __LDBL_DECIMAL_DIG__ 2025-05-07T19:45:01.1481083Z #define __ELF__ 1 2025-05-07T19:45:01.1481352Z #define __FINITE_MATH_ONLY__ 0 2025-05-07T19:45:01.1481745Z #define __FLOAT128__ 1 2025-05-07T19:45:01.1482008Z #define __FLT16_DECIMAL_DIG__ 5 2025-05-07T19:45:01.1482318Z #define __FLT16_DENORM_MIN__ 5.9604644775390625e-8F16 2025-05-07T19:45:01.1482671Z #define __FLT16_DIG__ 3 2025-05-07T19:45:01.1482918Z #define __FLT16_EPSILON__ 9.765625e-4F16 2025-05-07T19:45:01.1483236Z #define __FLT16_HAS_DENORM__ 1 2025-05-07T19:45:01.1483509Z #define __FLT16_HAS_INFINITY__ 1 2025-05-07T19:45:01.1483802Z #define __FLT16_HAS_QUIET_NAN__ 1 2025-05-07T19:45:01.1484094Z #define __FLT16_MANT_DIG__ 11 2025-05-07T19:45:01.1484367Z #define __FLT16_MAX_10_EXP__ 4 2025-05-07T19:45:01.1484896Z #define __FLT16_MAX_EXP__ 16 2025-05-07T19:45:01.1485150Z #define __FLT16_MAX__ 6.5504e+4F16 2025-05-07T19:45:01.1485439Z #define __FLT16_MIN_10_EXP__ (-4) 2025-05-07T19:45:01.1485704Z #define __FLT16_MIN_EXP__ (-13) 2025-05-07T19:45:01.1485989Z #define __FLT16_MIN__ 6.103515625e-5F16 2025-05-07T19:45:01.1486270Z #define __FLT_DECIMAL_DIG__ 9 2025-05-07T19:45:01.1486560Z #define __FLT_DENORM_MIN__ 1.40129846e-45F 2025-05-07T19:45:01.1486865Z #define __FLT_DIG__ 6 2025-05-07T19:45:01.1487108Z #define __FLT_EPSILON__ 1.19209290e-7F 2025-05-07T19:45:01.1487405Z #define __FLT_HAS_DENORM__ 1 2025-05-07T19:45:01.1487656Z #define __FLT_HAS_INFINITY__ 1 2025-05-07T19:45:01.1487938Z #define __FLT_HAS_QUIET_NAN__ 1 2025-05-07T19:45:01.1488193Z #define __FLT_MANT_DIG__ 24 2025-05-07T19:45:01.1488456Z #define __FLT_MAX_10_EXP__ 38 2025-05-07T19:45:01.1488708Z #define __FLT_MAX_EXP__ 128 2025-05-07T19:45:01.1488975Z #define __FLT_MAX__ 3.40282347e+38F 2025-05-07T19:45:01.1489424Z #define __FLT_MIN_10_EXP__ (-37) 2025-05-07T19:45:01.1489715Z #define __FLT_MIN_EXP__ (-125) 2025-05-07T19:45:01.1489995Z #define __FLT_MIN__ 1.17549435e-38F 2025-05-07T19:45:01.1490261Z #define __FLT_RADIX__ 2 2025-05-07T19:45:01.1490513Z #define __FXSR__ 1 2025-05-07T19:45:01.1490740Z #define __GCC_ASM_FLAG_OUTPUTS__ 1 2025-05-07T19:45:01.1491051Z #define __GCC_ATOMIC_BOOL_LOCK_FREE 2 2025-05-07T19:45:01.1491342Z #define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 2 2025-05-07T19:45:01.1491658Z #define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2 2025-05-07T19:45:01.1492058Z #define __GCC_ATOMIC_CHAR_LOCK_FREE 2 2025-05-07T19:45:01.1492360Z #define __GCC_ATOMIC_INT_LOCK_FREE 2 2025-05-07T19:45:01.1492644Z #define __GCC_ATOMIC_LLONG_LOCK_FREE 2 2025-05-07T19:45:01.1492950Z #define __GCC_ATOMIC_LONG_LOCK_FREE 2 2025-05-07T19:45:01.1493260Z #define __GCC_ATOMIC_POINTER_LOCK_FREE 2 2025-05-07T19:45:01.1493557Z #define __GCC_ATOMIC_SHORT_LOCK_FREE 2 2025-05-07T19:45:01.1493878Z #define __GCC_ATOMIC_TEST_AND_SET_TRUEVAL 1 2025-05-07T19:45:01.1494188Z #define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2 2025-05-07T19:45:01.1494501Z #define __GCC_HAVE_DWARF2_CFI_ASM 1 2025-05-07T19:45:01.1494792Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1 2025-05-07T19:45:01.1495126Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1 2025-05-07T19:45:01.1495567Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1 2025-05-07T19:45:01.1496108Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1 2025-05-07T19:45:01.1496460Z #define __GNUC_MINOR__ 2 2025-05-07T19:45:01.1496720Z #define __GNUC_PATCHLEVEL__ 1 2025-05-07T19:45:01.1497024Z #define __GNUC_STDC_INLINE__ 1 2025-05-07T19:45:01.1497286Z #define __GNUC__ 4 2025-05-07T19:45:01.1497536Z #define __GXX_ABI_VERSION 1002 2025-05-07T19:45:01.1497804Z #define __INT16_C_SUFFIX__ 2025-05-07T19:45:01.1498079Z #define __INT16_FMTd__ "hd" 2025-05-07T19:45:01.1498336Z #define __INT16_FMTi__ "hi" 2025-05-07T19:45:01.1498611Z #define __INT16_MAX__ 32767 2025-05-07T19:45:01.1498872Z #define __INT16_TYPE__ short 2025-05-07T19:45:01.1499159Z #define __INT32_C_SUFFIX__ 2025-05-07T19:45:01.1499415Z #define __INT32_FMTd__ "d" 2025-05-07T19:45:01.1499686Z #define __INT32_FMTi__ "i" 2025-05-07T19:45:01.1499963Z #define __INT32_MAX__ 2147483647 2025-05-07T19:45:01.1500239Z #define __INT32_TYPE__ int 2025-05-07T19:45:01.1500522Z #define __INT64_C_SUFFIX__ L 2025-05-07T19:45:01.1500787Z #define __INT64_FMTd__ "ld" 2025-05-07T19:45:01.1501073Z #define __INT64_FMTi__ "li" 2025-05-07T19:45:01.1501343Z #define __INT64_MAX__ 9223372036854775807L 2025-05-07T19:45:01.1501665Z #define __INT64_TYPE__ long int 2025-05-07T19:45:01.1502051Z #define __INT8_C_SUFFIX__ 2025-05-07T19:45:01.1502313Z #define __INT8_FMTd__ "hhd" 2025-05-07T19:45:01.1502551Z #define __INT8_FMTi__ "hhi" 2025-05-07T19:45:01.1502806Z #define __INT8_MAX__ 127 2025-05-07T19:45:01.1503066Z #define __INT8_TYPE__ signed char 2025-05-07T19:45:01.1503332Z #define __INTMAX_C_SUFFIX__ L 2025-05-07T19:45:01.1503605Z #define __INTMAX_FMTd__ "ld" 2025-05-07T19:45:01.1503861Z #define __INTMAX_FMTi__ "li" 2025-05-07T19:45:01.1504139Z #define __INTMAX_MAX__ 9223372036854775807L 2025-05-07T19:45:01.1504430Z #define __INTMAX_TYPE__ long int 2025-05-07T19:45:01.1504714Z #define __INTMAX_WIDTH__ 64 2025-05-07T19:45:01.1504964Z #define __INTPTR_FMTd__ "ld" 2025-05-07T19:45:01.1505236Z #define __INTPTR_FMTi__ "li" 2025-05-07T19:45:01.1505501Z #define __INTPTR_MAX__ 9223372036854775807L 2025-05-07T19:45:01.1505817Z #define __INTPTR_TYPE__ long int 2025-05-07T19:45:01.1506102Z #define __INTPTR_WIDTH__ 64 2025-05-07T19:45:01.1506356Z #define __INT_FAST16_FMTd__ "hd" 2025-05-07T19:45:01.1506648Z #define __INT_FAST16_FMTi__ "hi" 2025-05-07T19:45:01.1506912Z #define __INT_FAST16_MAX__ 32767 2025-05-07T19:45:01.1507205Z #define __INT_FAST16_TYPE__ short 2025-05-07T19:45:01.1507480Z #define __INT_FAST16_WIDTH__ 16 2025-05-07T19:45:01.1507958Z #define __INT_FAST32_FMTd__ "d" 2025-05-07T19:45:01.1508227Z #define __INT_FAST32_FMTi__ "i" 2025-05-07T19:45:01.1508531Z #define __INT_FAST32_MAX__ 2147483647 2025-05-07T19:45:01.1508937Z #define __INT_FAST32_TYPE__ int 2025-05-07T19:45:01.1509237Z #define __INT_FAST32_WIDTH__ 32 2025-05-07T19:45:01.1509531Z #define __INT_FAST64_FMTd__ "ld" 2025-05-07T19:45:01.1509811Z #define __INT_FAST64_FMTi__ "li" 2025-05-07T19:45:01.1510134Z #define __INT_FAST64_MAX__ 9223372036854775807L 2025-05-07T19:45:01.1510464Z #define __INT_FAST64_TYPE__ long int 2025-05-07T19:45:01.1510783Z #define __INT_FAST64_WIDTH__ 64 2025-05-07T19:45:01.1511063Z #define __INT_FAST8_FMTd__ "hhd" 2025-05-07T19:45:01.1511371Z #define __INT_FAST8_FMTi__ "hhi" 2025-05-07T19:45:01.1511727Z #define __INT_FAST8_MAX__ 127 2025-05-07T19:45:01.1512035Z #define __INT_FAST8_TYPE__ signed char 2025-05-07T19:45:01.1512335Z #define __INT_FAST8_WIDTH__ 8 2025-05-07T19:45:01.1512621Z #define __INT_LEAST16_FMTd__ "hd" 2025-05-07T19:45:01.1512926Z #define __INT_LEAST16_FMTi__ "hi" 2025-05-07T19:45:01.1513204Z #define __INT_LEAST16_MAX__ 32767 2025-05-07T19:45:01.1513628Z #define __INT_LEAST16_TYPE__ short 2025-05-07T19:45:01.1513901Z #define __INT_LEAST16_WIDTH__ 16 2025-05-07T19:45:01.1514184Z #define __INT_LEAST32_FMTd__ "d" 2025-05-07T19:45:01.1514440Z #define __INT_LEAST32_FMTi__ "i" 2025-05-07T19:45:01.1514724Z #define __INT_LEAST32_MAX__ 2147483647 2025-05-07T19:45:01.1515006Z #define __INT_LEAST32_TYPE__ int 2025-05-07T19:45:01.1515286Z #define __INT_LEAST32_WIDTH__ 32 2025-05-07T19:45:01.1515550Z #define __INT_LEAST64_FMTd__ "ld" 2025-05-07T19:45:01.1515837Z #define __INT_LEAST64_FMTi__ "li" 2025-05-07T19:45:01.1516145Z #define __INT_LEAST64_MAX__ 9223372036854775807L 2025-05-07T19:45:01.1516467Z #define __INT_LEAST64_TYPE__ long int 2025-05-07T19:45:01.1516769Z #define __INT_LEAST64_WIDTH__ 64 2025-05-07T19:45:01.1517033Z #define __INT_LEAST8_FMTd__ "hhd" 2025-05-07T19:45:01.1517321Z #define __INT_LEAST8_FMTi__ "hhi" 2025-05-07T19:45:01.1517589Z #define __INT_LEAST8_MAX__ 127 2025-05-07T19:45:01.1517870Z #define __INT_LEAST8_TYPE__ signed char 2025-05-07T19:45:01.1518153Z #define __INT_LEAST8_WIDTH__ 8 2025-05-07T19:45:01.1518425Z #define __INT_MAX__ 2147483647 2025-05-07T19:45:01.1518687Z #define __INT_WIDTH__ 32 2025-05-07T19:45:01.1518934Z #define __LDBL_DECIMAL_DIG__ 21 2025-05-07T19:45:01.1519255Z #define __LDBL_DENORM_MIN__ 3.64519953188247460253e-4951L 2025-05-07T19:45:01.1519581Z #define __LDBL_DIG__ 18 2025-05-07T19:45:01.1519865Z #define __LDBL_EPSILON__ 1.08420217248550443401e-19L 2025-05-07T19:45:01.1520180Z #define __LDBL_HAS_DENORM__ 1 2025-05-07T19:45:01.1520454Z #define __LDBL_HAS_INFINITY__ 1 2025-05-07T19:45:01.1520715Z #define __LDBL_HAS_QUIET_NAN__ 1 2025-05-07T19:45:01.1520995Z #define __LDBL_MANT_DIG__ 64 2025-05-07T19:45:01.1521249Z #define __LDBL_MAX_10_EXP__ 4932 2025-05-07T19:45:01.1521527Z #define __LDBL_MAX_EXP__ 16384 2025-05-07T19:45:01.1521818Z #define __LDBL_MAX__ 1.18973149535723176502e+4932L 2025-05-07T19:45:01.1522127Z #define __LDBL_MIN_10_EXP__ (-4931) 2025-05-07T19:45:01.1522419Z #define __LDBL_MIN_EXP__ (-16381) 2025-05-07T19:45:01.1522708Z #define __LDBL_MIN__ 3.36210314311209350626e-4932L 2025-05-07T19:45:01.1523032Z #define __LITTLE_ENDIAN__ 1 2025-05-07T19:45:01.1523276Z #define __LLONG_WIDTH__ 64 2025-05-07T19:45:01.1523559Z #define __LONG_LONG_MAX__ 9223372036854775807LL 2025-05-07T19:45:01.1523863Z #define __LONG_MAX__ 9223372036854775807L 2025-05-07T19:45:01.1524156Z #define __LONG_WIDTH__ 64 2025-05-07T19:45:01.1524391Z #define __LP64__ 1 2025-05-07T19:45:01.1524615Z #define __MMX__ 1 2025-05-07T19:45:01.1524847Z #define __NO_INLINE__ 1 2025-05-07T19:45:01.1525080Z #define __NO_MATH_INLINES 1 2025-05-07T19:45:01.1525340Z #define __OBJC_BOOL_IS_BOOL 0 2025-05-07T19:45:01.1525625Z #define __OPENCL_MEMORY_SCOPE_ALL_SVM_DEVICES 3 2025-05-07T19:45:01.1525959Z #define __OPENCL_MEMORY_SCOPE_DEVICE 2 2025-05-07T19:45:01.1526255Z #define __OPENCL_MEMORY_SCOPE_SUB_GROUP 4 2025-05-07T19:45:01.1526576Z #define __OPENCL_MEMORY_SCOPE_WORK_GROUP 1 2025-05-07T19:45:01.1526876Z #define __OPENCL_MEMORY_SCOPE_WORK_ITEM 0 2025-05-07T19:45:01.1527184Z #define __ORDER_BIG_ENDIAN__ 4321 2025-05-07T19:45:01.1527589Z #define __ORDER_LITTLE_ENDIAN__ 1234 2025-05-07T19:45:01.1527888Z #define __ORDER_PDP_ENDIAN__ 3412 2025-05-07T19:45:01.1528162Z #define __PIC__ 2 2025-05-07T19:45:01.1528371Z #define __PIE__ 2 2025-05-07T19:45:01.1528605Z #define __POINTER_WIDTH__ 64 2025-05-07T19:45:01.1528865Z #define __PRAGMA_REDEFINE_EXTNAME 1 2025-05-07T19:45:01.1529157Z #define __PTRDIFF_FMTd__ "ld" 2025-05-07T19:45:01.1529414Z #define __PTRDIFF_FMTi__ "li" 2025-05-07T19:45:01.1529709Z #define __PTRDIFF_MAX__ 9223372036854775807L 2025-05-07T19:45:01.1530004Z #define __PTRDIFF_TYPE__ long int 2025-05-07T19:45:01.1530394Z #define __PTRDIFF_WIDTH__ 64 2025-05-07T19:45:01.1530651Z #define __REGISTER_PREFIX__ 2025-05-07T19:45:01.1530922Z #define __SCHAR_MAX__ 127 2025-05-07T19:45:01.1531172Z #define __SEG_FS 1 2025-05-07T19:45:01.1531380Z #define __SEG_GS 1 2025-05-07T19:45:01.1531613Z #define __SHRT_MAX__ 32767 2025-05-07T19:45:01.1531856Z #define __SHRT_WIDTH__ 16 2025-05-07T19:45:01.1532117Z #define __SIG_ATOMIC_MAX__ 2147483647 2025-05-07T19:45:01.1532402Z #define __SIG_ATOMIC_WIDTH__ 32 2025-05-07T19:45:01.1532703Z #define __SIZEOF_DOUBLE__ 8 2025-05-07T19:45:01.1532952Z #define __SIZEOF_FLOAT128__ 16 2025-05-07T19:45:01.1533226Z #define __SIZEOF_FLOAT__ 4 2025-05-07T19:45:01.1533470Z #define __SIZEOF_INT128__ 16 2025-05-07T19:45:01.1533736Z #define __SIZEOF_INT__ 4 2025-05-07T19:45:01.1534001Z #define __SIZEOF_LONG_DOUBLE__ 16 2025-05-07T19:45:01.1534272Z #define __SIZEOF_LONG_LONG__ 8 2025-05-07T19:45:01.1534546Z #define __SIZEOF_LONG__ 8 2025-05-07T19:45:01.1534794Z #define __SIZEOF_POINTER__ 8 2025-05-07T19:45:01.1535075Z #define __SIZEOF_PTRDIFF_T__ 8 2025-05-07T19:45:01.1535446Z #define __SIZEOF_SHORT__ 2 2025-05-07T19:45:01.1535904Z #define __SIZEOF_SIZE_T__ 8 2025-05-07T19:45:01.1536175Z #define __SIZEOF_WCHAR_T__ 4 2025-05-07T19:45:01.1536571Z #define __SIZEOF_WINT_T__ 4 2025-05-07T19:45:01.1536840Z #define __SIZE_FMTX__ "lX" 2025-05-07T19:45:01.1537125Z #define __SIZE_FMTo__ "lo" 2025-05-07T19:45:01.1537414Z #define __SIZE_FMTu__ "lu" 2025-05-07T19:45:01.1537681Z #define __SIZE_FMTx__ "lx" 2025-05-07T19:45:01.1537972Z #define __SIZE_MAX__ 18446744073709551615UL 2025-05-07T19:45:01.1538297Z #define __SIZE_TYPE__ long unsigned int 2025-05-07T19:45:01.1538623Z #define __SIZE_WIDTH__ 64 2025-05-07T19:45:01.1538891Z #define __SSE2_MATH__ 1 2025-05-07T19:45:01.1539155Z #define __SSE2__ 1 2025-05-07T19:45:01.1539392Z #define __SSE_MATH__ 1 2025-05-07T19:45:01.1539662Z #define __SSE__ 1 2025-05-07T19:45:01.1539896Z #define __STDC_HOSTED__ 1 2025-05-07T19:45:01.1540174Z #define __STDC_UTF_16__ 1 2025-05-07T19:45:01.1540433Z #define __STDC_UTF_32__ 1 2025-05-07T19:45:01.1540708Z #define __STDC_VERSION__ 201710L 2025-05-07T19:45:01.1540999Z #define __STDC__ 1 2025-05-07T19:45:01.1541230Z #define __UINT16_C_SUFFIX__ 2025-05-07T19:45:01.1541524Z #define __UINT16_FMTX__ "hX" 2025-05-07T19:45:01.1541788Z #define __UINT16_FMTo__ "ho" 2025-05-07T19:45:01.1542071Z #define __UINT16_FMTu__ "hu" 2025-05-07T19:45:01.1542337Z #define __UINT16_FMTx__ "hx" 2025-05-07T19:45:01.1542617Z #define __UINT16_MAX__ 65535 2025-05-07T19:45:01.1542890Z #define __UINT16_TYPE__ unsigned short 2025-05-07T19:45:01.1543208Z #define __UINT32_C_SUFFIX__ U 2025-05-07T19:45:01.1543479Z #define __UINT32_FMTX__ "X" 2025-05-07T19:45:01.1543761Z #define __UINT32_FMTo__ "o" 2025-05-07T19:45:01.1544044Z #define __UINT32_FMTu__ "u" 2025-05-07T19:45:01.1544305Z #define __UINT32_FMTx__ "x" 2025-05-07T19:45:01.1544594Z #define __UINT32_MAX__ 4294967295U 2025-05-07T19:45:01.1544888Z #define __UINT32_TYPE__ unsigned int 2025-05-07T19:45:01.1545203Z #define __UINT64_C_SUFFIX__ UL 2025-05-07T19:45:01.1545474Z #define __UINT64_FMTX__ "lX" 2025-05-07T19:45:01.1545760Z #define __UINT64_FMTo__ "lo" 2025-05-07T19:45:01.1546025Z #define __UINT64_FMTu__ "lu" 2025-05-07T19:45:01.1546305Z #define __UINT64_FMTx__ "lx" 2025-05-07T19:45:01.1546588Z #define __UINT64_MAX__ 18446744073709551615UL 2025-05-07T19:45:01.1546935Z #define __UINT64_TYPE__ long unsigned int 2025-05-07T19:45:01.1547377Z #define __UINT8_C_SUFFIX__ 2025-05-07T19:45:01.1547645Z #define __UINT8_FMTX__ "hhX" 2025-05-07T19:45:01.1547936Z #define __UINT8_FMTo__ "hho" 2025-05-07T19:45:01.1548205Z #define __UINT8_FMTu__ "hhu" 2025-05-07T19:45:01.1548491Z #define __UINT8_FMTx__ "hhx" 2025-05-07T19:45:01.1548759Z #define __UINT8_MAX__ 255 2025-05-07T19:45:01.1549050Z #define __UINT8_TYPE__ unsigned char 2025-05-07T19:45:01.1549353Z #define __UINTMAX_C_SUFFIX__ UL 2025-05-07T19:45:01.1549655Z #define __UINTMAX_FMTX__ "lX" 2025-05-07T19:45:01.1550053Z #define __UINTMAX_FMTo__ "lo" 2025-05-07T19:45:01.1550515Z #define __UINTMAX_FMTu__ "lu" 2025-05-07T19:45:01.1550785Z #define __UINTMAX_FMTx__ "lx" 2025-05-07T19:45:01.1551053Z #define __UINTMAX_MAX__ 18446744073709551615UL 2025-05-07T19:45:01.1551390Z #define __UINTMAX_TYPE__ long unsigned int 2025-05-07T19:45:01.1551683Z #define __UINTMAX_WIDTH__ 64 2025-05-07T19:45:01.1551955Z #define __UINTPTR_FMTX__ "lX" 2025-05-07T19:45:01.1552215Z #define __UINTPTR_FMTo__ "lo" 2025-05-07T19:45:01.1552493Z #define __UINTPTR_FMTu__ "lu" 2025-05-07T19:45:01.1552744Z #define __UINTPTR_FMTx__ "lx" 2025-05-07T19:45:01.1553035Z #define __UINTPTR_MAX__ 18446744073709551615UL 2025-05-07T19:45:01.1553349Z #define __UINTPTR_TYPE__ long unsigned int 2025-05-07T19:45:01.1553661Z #define __UINTPTR_WIDTH__ 64 2025-05-07T19:45:01.1553936Z #define __UINT_FAST16_FMTX__ "hX" 2025-05-07T19:45:01.1554201Z #define __UINT_FAST16_FMTo__ "ho" 2025-05-07T19:45:01.1554483Z #define __UINT_FAST16_FMTu__ "hu" 2025-05-07T19:45:01.1554746Z #define __UINT_FAST16_FMTx__ "hx" 2025-05-07T19:45:01.1555027Z #define __UINT_FAST16_MAX__ 65535 2025-05-07T19:45:01.1555303Z #define __UINT_FAST16_TYPE__ unsigned short 2025-05-07T19:45:01.1555611Z #define __UINT_FAST32_FMTX__ "X" 2025-05-07T19:45:01.1555874Z #define __UINT_FAST32_FMTo__ "o" 2025-05-07T19:45:01.1556149Z #define __UINT_FAST32_FMTu__ "u" 2025-05-07T19:45:01.1556404Z #define __UINT_FAST32_FMTx__ "x" 2025-05-07T19:45:01.1556680Z #define __UINT_FAST32_MAX__ 4294967295U 2025-05-07T19:45:01.1556993Z #define __UINT_FAST32_TYPE__ unsigned int 2025-05-07T19:45:01.1557284Z #define __UINT_FAST64_FMTX__ "lX" 2025-05-07T19:45:01.1557564Z #define __UINT_FAST64_FMTo__ "lo" 2025-05-07T19:45:01.1557829Z #define __UINT_FAST64_FMTu__ "lu" 2025-05-07T19:45:01.1558114Z #define __UINT_FAST64_FMTx__ "lx" 2025-05-07T19:45:01.1558410Z #define __UINT_FAST64_MAX__ 18446744073709551615UL 2025-05-07T19:45:01.1558767Z #define __UINT_FAST64_TYPE__ long unsigned int 2025-05-07T19:45:01.1559076Z #define __UINT_FAST8_FMTX__ "hhX" 2025-05-07T19:45:01.1559360Z #define __UINT_FAST8_FMTo__ "hho" 2025-05-07T19:45:01.1559642Z #define __UINT_FAST8_FMTu__ "hhu" 2025-05-07T19:45:01.1559907Z #define __UINT_FAST8_FMTx__ "hhx" 2025-05-07T19:45:01.1560185Z #define __UINT_FAST8_MAX__ 255 2025-05-07T19:45:01.1560453Z #define __UINT_FAST8_TYPE__ unsigned char 2025-05-07T19:45:01.1560760Z #define __UINT_LEAST16_FMTX__ "hX" 2025-05-07T19:45:01.1561032Z #define __UINT_LEAST16_FMTo__ "ho" 2025-05-07T19:45:01.1561314Z #define __UINT_LEAST16_FMTu__ "hu" 2025-05-07T19:45:01.1561589Z #define __UINT_LEAST16_FMTx__ "hx" 2025-05-07T19:45:01.1561874Z #define __UINT_LEAST16_MAX__ 65535 2025-05-07T19:45:01.1562161Z #define __UINT_LEAST16_TYPE__ unsigned short 2025-05-07T19:45:01.1562478Z #define __UINT_LEAST32_FMTX__ "X" 2025-05-07T19:45:01.1562768Z #define __UINT_LEAST32_FMTo__ "o" 2025-05-07T19:45:01.1563035Z #define __UINT_LEAST32_FMTu__ "u" 2025-05-07T19:45:01.1563322Z #define __UINT_LEAST32_FMTx__ "x" 2025-05-07T19:45:01.1563597Z #define __UINT_LEAST32_MAX__ 4294967295U 2025-05-07T19:45:01.1563918Z #define __UINT_LEAST32_TYPE__ unsigned int 2025-05-07T19:45:01.1564232Z #define __UINT_LEAST64_FMTX__ "lX" 2025-05-07T19:45:01.1564530Z #define __UINT_LEAST64_FMTo__ "lo" 2025-05-07T19:45:01.1564803Z #define __UINT_LEAST64_FMTu__ "lu" 2025-05-07T19:45:01.1565101Z #define __UINT_LEAST64_FMTx__ "lx" 2025-05-07T19:45:01.1565421Z #define __UINT_LEAST64_MAX__ 18446744073709551615UL 2025-05-07T19:45:01.1565769Z #define __UINT_LEAST64_TYPE__ long unsigned int 2025-05-07T19:45:01.1566202Z #define __UINT_LEAST8_FMTX__ "hhX" 2025-05-07T19:45:01.1566669Z #define __UINT_LEAST8_FMTo__ "hho" 2025-05-07T19:45:01.1566972Z #define __UINT_LEAST8_FMTu__ "hhu" 2025-05-07T19:45:01.1567258Z #define __UINT_LEAST8_FMTx__ "hhx" 2025-05-07T19:45:01.1567566Z #define __UINT_LEAST8_MAX__ 255 2025-05-07T19:45:01.1568043Z #define __UINT_LEAST8_TYPE__ unsigned char 2025-05-07T19:45:01.1568396Z #define __USER_LABEL_PREFIX__ 2025-05-07T19:45:01.1569118Z #define __VERSION__ "Clang 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4)" 2025-05-07T19:45:01.1569791Z #define __WCHAR_MAX__ 2147483647 2025-05-07T19:45:01.1570103Z #define __WCHAR_TYPE__ int 2025-05-07T19:45:01.1570374Z #define __WCHAR_WIDTH__ 32 2025-05-07T19:45:01.1570778Z #define __WINT_MAX__ 4294967295U 2025-05-07T19:45:01.1571056Z #define __WINT_TYPE__ unsigned int 2025-05-07T19:45:01.1571359Z #define __WINT_UNSIGNED__ 1 2025-05-07T19:45:01.1571611Z #define __WINT_WIDTH__ 32 2025-05-07T19:45:01.1571888Z #define __amd64 1 2025-05-07T19:45:01.1572110Z #define __amd64__ 1 2025-05-07T19:45:01.1572355Z #define __clang__ 1 2025-05-07T19:45:01.1572629Z #define __clang_literal_encoding__ "UTF-8" 2025-05-07T19:45:01.1572932Z #define __clang_major__ 16 2025-05-07T19:45:01.1573212Z #define __clang_minor__ 0 2025-05-07T19:45:01.1573468Z #define __clang_patchlevel__ 6 2025-05-07T19:45:01.1574077Z #define __clang_version__ "16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4)" 2025-05-07T19:45:01.1575109Z #define __clang_wide_literal_encoding__ "UTF-32" 2025-05-07T19:45:01.1575552Z #define __code_model_small__ 1 2025-05-07T19:45:01.1575944Z #define __gnu_linux__ 1 2025-05-07T19:45:01.1576207Z #define __k8 1 2025-05-07T19:45:01.1576422Z #define __k8__ 1 2025-05-07T19:45:01.1576658Z #define __linux 1 2025-05-07T19:45:01.1576900Z #define __linux__ 1 2025-05-07T19:45:01.1577123Z #define __llvm__ 1 2025-05-07T19:45:01.1577362Z #define __pic__ 2 2025-05-07T19:45:01.1577589Z #define __pie__ 2 2025-05-07T19:45:01.1577885Z #define __seg_fs __attribute__((address_space(257))) 2025-05-07T19:45:01.1578274Z #define __seg_gs __attribute__((address_space(256))) 2025-05-07T19:45:01.1578626Z #define __tune_k8__ 1 2025-05-07T19:45:01.1578864Z #define __unix 1 2025-05-07T19:45:01.1579104Z #define __unix__ 1 2025-05-07T19:45:01.1579327Z #define __x86_64 1 2025-05-07T19:45:01.1579565Z #define __x86_64__ 1 2025-05-07T19:45:01.1579795Z #define linux 1 2025-05-07T19:45:01.1580024Z #define unix 1 2025-05-07T19:45:01.1580155Z 2025-05-07T19:45:01.2047316Z 2025-05-07T19:45:01.2048579Z [INFO] Printing out all preprocessor defines in the C++ compiler ... 2025-05-07T19:45:01.2049198Z + conda run -n build_binary c++ -dM -E -x c++ - 2025-05-07T19:45:01.2049469Z 2025-05-07T19:45:03.0232454Z #define _GNU_SOURCE 1 2025-05-07T19:45:03.0233042Z #define _LP64 1 2025-05-07T19:45:03.0233335Z #define __ATOMIC_ACQUIRE 2 2025-05-07T19:45:03.0233635Z #define __ATOMIC_ACQ_REL 4 2025-05-07T19:45:03.0234016Z #define __ATOMIC_CONSUME 1 2025-05-07T19:45:03.0234307Z #define __ATOMIC_RELAXED 0 2025-05-07T19:45:03.0234617Z #define __ATOMIC_RELEASE 3 2025-05-07T19:45:03.0234901Z #define __ATOMIC_SEQ_CST 5 2025-05-07T19:45:03.0235333Z #define __BIGGEST_ALIGNMENT__ 16 2025-05-07T19:45:03.0235647Z #define __BITINT_MAXWIDTH__ 8388608 2025-05-07T19:45:03.0235986Z #define __BOOL_WIDTH__ 8 2025-05-07T19:45:03.0236292Z #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__ 2025-05-07T19:45:03.0236669Z #define __CHAR16_TYPE__ unsigned short 2025-05-07T19:45:03.0237019Z #define __CHAR32_TYPE__ unsigned int 2025-05-07T19:45:03.0237337Z #define __CHAR_BIT__ 8 2025-05-07T19:45:03.0237632Z #define __CLANG_ATOMIC_BOOL_LOCK_FREE 2 2025-05-07T19:45:03.0237961Z #define __CLANG_ATOMIC_CHAR16_T_LOCK_FREE 2 2025-05-07T19:45:03.0238325Z #define __CLANG_ATOMIC_CHAR32_T_LOCK_FREE 2 2025-05-07T19:45:03.0238653Z #define __CLANG_ATOMIC_CHAR_LOCK_FREE 2 2025-05-07T19:45:03.0239008Z #define __CLANG_ATOMIC_INT_LOCK_FREE 2 2025-05-07T19:45:03.0239848Z #define __CLANG_ATOMIC_LLONG_LOCK_FREE 2 2025-05-07T19:45:03.0240186Z #define __CLANG_ATOMIC_LONG_LOCK_FREE 2 2025-05-07T19:45:03.0240534Z #define __CLANG_ATOMIC_POINTER_LOCK_FREE 2 2025-05-07T19:45:03.0240855Z #define __CLANG_ATOMIC_SHORT_LOCK_FREE 2 2025-05-07T19:45:03.0241201Z #define __CLANG_ATOMIC_WCHAR_T_LOCK_FREE 2 2025-05-07T19:45:03.0241514Z #define __CONSTANT_CFSTRINGS__ 1 2025-05-07T19:45:03.0241827Z #define __DBL_DECIMAL_DIG__ 17 2025-05-07T19:45:03.0242131Z #define __DBL_DENORM_MIN__ 4.9406564584124654e-324 2025-05-07T19:45:03.0242482Z #define __DBL_DIG__ 15 2025-05-07T19:45:03.0243137Z #define __DBL_EPSILON__ 2.2204460492503131e-16 2025-05-07T19:45:03.0243472Z #define __DBL_HAS_DENORM__ 1 2025-05-07T19:45:03.0243772Z #define __DBL_HAS_INFINITY__ 1 2025-05-07T19:45:03.0244078Z #define __DBL_HAS_QUIET_NAN__ 1 2025-05-07T19:45:03.0244354Z #define __DBL_MANT_DIG__ 53 2025-05-07T19:45:03.0244657Z #define __DBL_MAX_10_EXP__ 308 2025-05-07T19:45:03.0244937Z #define __DBL_MAX_EXP__ 1024 2025-05-07T19:45:03.0245260Z #define __DBL_MAX__ 1.7976931348623157e+308 2025-05-07T19:45:03.0245579Z #define __DBL_MIN_10_EXP__ (-307) 2025-05-07T19:45:03.0245900Z #define __DBL_MIN_EXP__ (-1021) 2025-05-07T19:45:03.0246188Z #define __DBL_MIN__ 2.2250738585072014e-308 2025-05-07T19:45:03.0246551Z #define __DECIMAL_DIG__ __LDBL_DECIMAL_DIG__ 2025-05-07T19:45:03.0246870Z #define __DEPRECATED 1 2025-05-07T19:45:03.0247151Z #define __ELF__ 1 2025-05-07T19:45:03.0247420Z #define __EXCEPTIONS 1 2025-05-07T19:45:03.0247685Z #define __FINITE_MATH_ONLY__ 0 2025-05-07T19:45:03.0247987Z #define __FLOAT128__ 1 2025-05-07T19:45:03.0248244Z #define __FLT16_DECIMAL_DIG__ 5 2025-05-07T19:45:03.0248592Z #define __FLT16_DENORM_MIN__ 5.9604644775390625e-8F16 2025-05-07T19:45:03.0248933Z #define __FLT16_DIG__ 3 2025-05-07T19:45:03.0249234Z #define __FLT16_EPSILON__ 9.765625e-4F16 2025-05-07T19:45:03.0249546Z #define __FLT16_HAS_DENORM__ 1 2025-05-07T19:45:03.0249867Z #define __FLT16_HAS_INFINITY__ 1 2025-05-07T19:45:03.0250164Z #define __FLT16_HAS_QUIET_NAN__ 1 2025-05-07T19:45:03.0250485Z #define __FLT16_MANT_DIG__ 11 2025-05-07T19:45:03.0250789Z #define __FLT16_MAX_10_EXP__ 4 2025-05-07T19:45:03.0251049Z #define __FLT16_MAX_EXP__ 16 2025-05-07T19:45:03.0251344Z #define __FLT16_MAX__ 6.5504e+4F16 2025-05-07T19:45:03.0251626Z #define __FLT16_MIN_10_EXP__ (-4) 2025-05-07T19:45:03.0251932Z #define __FLT16_MIN_EXP__ (-13) 2025-05-07T19:45:03.0252204Z #define __FLT16_MIN__ 6.103515625e-5F16 2025-05-07T19:45:03.0252522Z #define __FLT_DECIMAL_DIG__ 9 2025-05-07T19:45:03.0252803Z #define __FLT_DENORM_MIN__ 1.40129846e-45F 2025-05-07T19:45:03.0253129Z #define __FLT_DIG__ 6 2025-05-07T19:45:03.0253382Z #define __FLT_EPSILON__ 1.19209290e-7F 2025-05-07T19:45:03.0253703Z #define __FLT_HAS_DENORM__ 1 2025-05-07T19:45:03.0253997Z #define __FLT_HAS_INFINITY__ 1 2025-05-07T19:45:03.0254272Z #define __FLT_HAS_QUIET_NAN__ 1 2025-05-07T19:45:03.0254568Z #define __FLT_MANT_DIG__ 24 2025-05-07T19:45:03.0254829Z #define __FLT_MAX_10_EXP__ 38 2025-05-07T19:45:03.0255144Z #define __FLT_MAX_EXP__ 128 2025-05-07T19:45:03.0255557Z #define __FLT_MAX__ 3.40282347e+38F 2025-05-07T19:45:03.0256075Z #define __FLT_MIN_10_EXP__ (-37) 2025-05-07T19:45:03.0256466Z #define __FLT_MIN_EXP__ (-125) 2025-05-07T19:45:03.0256783Z #define __FLT_MIN__ 1.17549435e-38F 2025-05-07T19:45:03.0257084Z #define __FLT_RADIX__ 2 2025-05-07T19:45:03.0257366Z #define __FXSR__ 1 2025-05-07T19:45:03.0257654Z #define __GCC_ASM_FLAG_OUTPUTS__ 1 2025-05-07T19:45:03.0257978Z #define __GCC_ATOMIC_BOOL_LOCK_FREE 2 2025-05-07T19:45:03.0258332Z #define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 2 2025-05-07T19:45:03.0258678Z #define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2 2025-05-07T19:45:03.0259042Z #define __GCC_ATOMIC_CHAR_LOCK_FREE 2 2025-05-07T19:45:03.0259364Z #define __GCC_ATOMIC_INT_LOCK_FREE 2 2025-05-07T19:45:03.0259715Z #define __GCC_ATOMIC_LLONG_LOCK_FREE 2 2025-05-07T19:45:03.0260039Z #define __GCC_ATOMIC_LONG_LOCK_FREE 2 2025-05-07T19:45:03.0260391Z #define __GCC_ATOMIC_POINTER_LOCK_FREE 2 2025-05-07T19:45:03.0260848Z #define __GCC_ATOMIC_SHORT_LOCK_FREE 2 2025-05-07T19:45:03.0261220Z #define __GCC_ATOMIC_TEST_AND_SET_TRUEVAL 1 2025-05-07T19:45:03.0261603Z #define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2 2025-05-07T19:45:03.0262055Z #define __GCC_HAVE_DWARF2_CFI_ASM 1 2025-05-07T19:45:03.0262398Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1 2025-05-07T19:45:03.0262732Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1 2025-05-07T19:45:03.0263083Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1 2025-05-07T19:45:03.0263408Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1 2025-05-07T19:45:03.0263842Z #define __GLIBCXX_BITSIZE_INT_N_0 128 2025-05-07T19:45:03.0264145Z #define __GLIBCXX_TYPE_INT_N_0 __int128 2025-05-07T19:45:03.0264468Z #define __GNUC_GNU_INLINE__ 1 2025-05-07T19:45:03.0264760Z #define __GNUC_MINOR__ 2 2025-05-07T19:45:03.0265022Z #define __GNUC_PATCHLEVEL__ 1 2025-05-07T19:45:03.0265315Z #define __GNUC__ 4 2025-05-07T19:45:03.0265540Z #define __GNUG__ 4 2025-05-07T19:45:03.0265804Z #define __GXX_ABI_VERSION 1002 2025-05-07T19:45:03.0266095Z #define __GXX_EXPERIMENTAL_CXX0X__ 1 2025-05-07T19:45:03.0266405Z #define __GXX_RTTI 1 2025-05-07T19:45:03.0266637Z #define __GXX_WEAK__ 1 2025-05-07T19:45:03.0266908Z #define __INT16_C_SUFFIX__ 2025-05-07T19:45:03.0267165Z #define __INT16_FMTd__ "hd" 2025-05-07T19:45:03.0267446Z #define __INT16_FMTi__ "hi" 2025-05-07T19:45:03.0267700Z #define __INT16_MAX__ 32767 2025-05-07T19:45:03.0267980Z #define __INT16_TYPE__ short 2025-05-07T19:45:03.0268267Z #define __INT32_C_SUFFIX__ 2025-05-07T19:45:03.0268520Z #define __INT32_FMTd__ "d" 2025-05-07T19:45:03.0268799Z #define __INT32_FMTi__ "i" 2025-05-07T19:45:03.0269057Z #define __INT32_MAX__ 2147483647 2025-05-07T19:45:03.0269355Z #define __INT32_TYPE__ int 2025-05-07T19:45:03.0269612Z #define __INT64_C_SUFFIX__ L 2025-05-07T19:45:03.0269903Z #define __INT64_FMTd__ "ld" 2025-05-07T19:45:03.0270166Z #define __INT64_FMTi__ "li" 2025-05-07T19:45:03.0270463Z #define __INT64_MAX__ 9223372036854775807L 2025-05-07T19:45:03.0270769Z #define __INT64_TYPE__ long int 2025-05-07T19:45:03.0271071Z #define __INT8_C_SUFFIX__ 2025-05-07T19:45:03.0271359Z #define __INT8_FMTd__ "hhd" 2025-05-07T19:45:03.0271623Z #define __INT8_FMTi__ "hhi" 2025-05-07T19:45:03.0271916Z #define __INT8_MAX__ 127 2025-05-07T19:45:03.0272176Z #define __INT8_TYPE__ signed char 2025-05-07T19:45:03.0272491Z #define __INTMAX_C_SUFFIX__ L 2025-05-07T19:45:03.0272766Z #define __INTMAX_FMTd__ "ld" 2025-05-07T19:45:03.0273064Z #define __INTMAX_FMTi__ "li" 2025-05-07T19:45:03.0273347Z #define __INTMAX_MAX__ 9223372036854775807L 2025-05-07T19:45:03.0273684Z #define __INTMAX_TYPE__ long int 2025-05-07T19:45:03.0273966Z #define __INTMAX_WIDTH__ 64 2025-05-07T19:45:03.0274256Z #define __INTPTR_FMTd__ "ld" 2025-05-07T19:45:03.0274556Z #define __INTPTR_FMTi__ "li" 2025-05-07T19:45:03.0275215Z #define __INTPTR_MAX__ 9223372036854775807L 2025-05-07T19:45:03.0275591Z #define __INTPTR_TYPE__ long int 2025-05-07T19:45:03.0275898Z #define __INTPTR_WIDTH__ 64 2025-05-07T19:45:03.0276218Z #define __INT_FAST16_FMTd__ "hd" 2025-05-07T19:45:03.0276526Z #define __INT_FAST16_FMTi__ "hi" 2025-05-07T19:45:03.0276860Z #define __INT_FAST16_MAX__ 32767 2025-05-07T19:45:03.0277155Z #define __INT_FAST16_TYPE__ short 2025-05-07T19:45:03.0277495Z #define __INT_FAST16_WIDTH__ 16 2025-05-07T19:45:03.0277797Z #define __INT_FAST32_FMTd__ "d" 2025-05-07T19:45:03.0278128Z #define __INT_FAST32_FMTi__ "i" 2025-05-07T19:45:03.0278464Z #define __INT_FAST32_MAX__ 2147483647 2025-05-07T19:45:03.0278790Z #define __INT_FAST32_TYPE__ int 2025-05-07T19:45:03.0279117Z #define __INT_FAST32_WIDTH__ 32 2025-05-07T19:45:03.0279423Z #define __INT_FAST64_FMTd__ "ld" 2025-05-07T19:45:03.0279764Z #define __INT_FAST64_FMTi__ "li" 2025-05-07T19:45:03.0280084Z #define __INT_FAST64_MAX__ 9223372036854775807L 2025-05-07T19:45:03.0280463Z #define __INT_FAST64_TYPE__ long int 2025-05-07T19:45:03.0280776Z #define __INT_FAST64_WIDTH__ 64 2025-05-07T19:45:03.0281109Z #define __INT_FAST8_FMTd__ "hhd" 2025-05-07T19:45:03.0281522Z #define __INT_FAST8_FMTi__ "hhi" 2025-05-07T19:45:03.0282006Z #define __INT_FAST8_MAX__ 127 2025-05-07T19:45:03.0282321Z #define __INT_FAST8_TYPE__ signed char 2025-05-07T19:45:03.0282631Z #define __INT_FAST8_WIDTH__ 8 2025-05-07T19:45:03.0282931Z #define __INT_LEAST16_FMTd__ "hd" 2025-05-07T19:45:03.0283213Z #define __INT_LEAST16_FMTi__ "hi" 2025-05-07T19:45:03.0283519Z #define __INT_LEAST16_MAX__ 32767 2025-05-07T19:45:03.0283800Z #define __INT_LEAST16_TYPE__ short 2025-05-07T19:45:03.0284116Z #define __INT_LEAST16_WIDTH__ 16 2025-05-07T19:45:03.0284390Z #define __INT_LEAST32_FMTd__ "d" 2025-05-07T19:45:03.0284781Z #define __INT_LEAST32_FMTi__ "i" 2025-05-07T19:45:03.0285069Z #define __INT_LEAST32_MAX__ 2147483647 2025-05-07T19:45:03.0285383Z #define __INT_LEAST32_TYPE__ int 2025-05-07T19:45:03.0285687Z #define __INT_LEAST32_WIDTH__ 32 2025-05-07T19:45:03.0285964Z #define __INT_LEAST64_FMTd__ "ld" 2025-05-07T19:45:03.0286275Z #define __INT_LEAST64_FMTi__ "li" 2025-05-07T19:45:03.0286579Z #define __INT_LEAST64_MAX__ 9223372036854775807L 2025-05-07T19:45:03.0286939Z #define __INT_LEAST64_TYPE__ long int 2025-05-07T19:45:03.0287242Z #define __INT_LEAST64_WIDTH__ 64 2025-05-07T19:45:03.0287552Z #define __INT_LEAST8_FMTd__ "hhd" 2025-05-07T19:45:03.0308261Z #define __INT_LEAST8_FMTi__ "hhi" 2025-05-07T19:45:03.0308606Z #define __INT_LEAST8_MAX__ 127 2025-05-07T19:45:03.0308942Z #define __INT_LEAST8_TYPE__ signed char 2025-05-07T19:45:03.0309255Z #define __INT_LEAST8_WIDTH__ 8 2025-05-07T19:45:03.0309556Z #define __INT_MAX__ 2147483647 2025-05-07T19:45:03.0309825Z #define __INT_WIDTH__ 32 2025-05-07T19:45:03.0310132Z #define __LDBL_DECIMAL_DIG__ 21 2025-05-07T19:45:03.0310455Z #define __LDBL_DENORM_MIN__ 3.64519953188247460253e-4951L 2025-05-07T19:45:03.0310839Z #define __LDBL_DIG__ 18 2025-05-07T19:45:03.0311150Z #define __LDBL_EPSILON__ 1.08420217248550443401e-19L 2025-05-07T19:45:03.0311490Z #define __LDBL_HAS_DENORM__ 1 2025-05-07T19:45:03.0311799Z #define __LDBL_HAS_INFINITY__ 1 2025-05-07T19:45:03.0312079Z #define __LDBL_HAS_QUIET_NAN__ 1 2025-05-07T19:45:03.0312406Z #define __LDBL_MANT_DIG__ 64 2025-05-07T19:45:03.0312675Z #define __LDBL_MAX_10_EXP__ 4932 2025-05-07T19:45:03.0312983Z #define __LDBL_MAX_EXP__ 16384 2025-05-07T19:45:03.0313286Z #define __LDBL_MAX__ 1.18973149535723176502e+4932L 2025-05-07T19:45:03.0313645Z #define __LDBL_MIN_10_EXP__ (-4931) 2025-05-07T19:45:03.0313948Z #define __LDBL_MIN_EXP__ (-16381) 2025-05-07T19:45:03.0314285Z #define __LDBL_MIN__ 3.36210314311209350626e-4932L 2025-05-07T19:45:03.0314641Z #define __LITTLE_ENDIAN__ 1 2025-05-07T19:45:03.0314913Z #define __LLONG_WIDTH__ 64 2025-05-07T19:45:03.0315239Z #define __LONG_LONG_MAX__ 9223372036854775807LL 2025-05-07T19:45:03.0315567Z #define __LONG_MAX__ 9223372036854775807L 2025-05-07T19:45:03.0315886Z #define __LONG_WIDTH__ 64 2025-05-07T19:45:03.0316134Z #define __LP64__ 1 2025-05-07T19:45:03.0316385Z #define __MMX__ 1 2025-05-07T19:45:03.0316619Z #define __NO_INLINE__ 1 2025-05-07T19:45:03.0316891Z #define __NO_MATH_INLINES 1 2025-05-07T19:45:03.0317163Z #define __OBJC_BOOL_IS_BOOL 0 2025-05-07T19:45:03.0317496Z #define __OPENCL_MEMORY_SCOPE_ALL_SVM_DEVICES 3 2025-05-07T19:45:03.0317864Z #define __OPENCL_MEMORY_SCOPE_DEVICE 2 2025-05-07T19:45:03.0318180Z #define __OPENCL_MEMORY_SCOPE_SUB_GROUP 4 2025-05-07T19:45:03.0318532Z #define __OPENCL_MEMORY_SCOPE_WORK_GROUP 1 2025-05-07T19:45:03.0318853Z #define __OPENCL_MEMORY_SCOPE_WORK_ITEM 0 2025-05-07T19:45:03.0319194Z #define __ORDER_BIG_ENDIAN__ 4321 2025-05-07T19:45:03.0319484Z #define __ORDER_LITTLE_ENDIAN__ 1234 2025-05-07T19:45:03.0319816Z #define __ORDER_PDP_ENDIAN__ 3412 2025-05-07T19:45:03.0320105Z #define __PIC__ 2 2025-05-07T19:45:03.0320357Z #define __PIE__ 2 2025-05-07T19:45:03.0320599Z #define __POINTER_WIDTH__ 64 2025-05-07T19:45:03.0320906Z #define __PRAGMA_REDEFINE_EXTNAME 1 2025-05-07T19:45:03.0321232Z #define __PTRDIFF_FMTd__ "ld" 2025-05-07T19:45:03.0321517Z #define __PTRDIFF_FMTi__ "li" 2025-05-07T19:45:03.0321839Z #define __PTRDIFF_MAX__ 9223372036854775807L 2025-05-07T19:45:03.0322153Z #define __PTRDIFF_TYPE__ long int 2025-05-07T19:45:03.0322700Z #define __PTRDIFF_WIDTH__ 64 2025-05-07T19:45:03.0322979Z #define __REGISTER_PREFIX__ 2025-05-07T19:45:03.0323282Z #define __SCHAR_MAX__ 127 2025-05-07T19:45:03.0323544Z #define __SEG_FS 1 2025-05-07T19:45:03.0323804Z #define __SEG_GS 1 2025-05-07T19:45:03.0324046Z #define __SHRT_MAX__ 32767 2025-05-07T19:45:03.0324338Z #define __SHRT_WIDTH__ 16 2025-05-07T19:45:03.0324630Z #define __SIG_ATOMIC_MAX__ 2147483647 2025-05-07T19:45:03.0324921Z #define __SIG_ATOMIC_WIDTH__ 32 2025-05-07T19:45:03.0325225Z #define __SIZEOF_DOUBLE__ 8 2025-05-07T19:45:03.0325588Z #define __SIZEOF_FLOAT128__ 16 2025-05-07T19:45:03.0325897Z #define __SIZEOF_FLOAT__ 4 2025-05-07T19:45:03.0326157Z #define __SIZEOF_INT128__ 16 2025-05-07T19:45:03.0326451Z #define __SIZEOF_INT__ 4 2025-05-07T19:45:03.0326716Z #define __SIZEOF_LONG_DOUBLE__ 16 2025-05-07T19:45:03.0327038Z #define __SIZEOF_LONG_LONG__ 8 2025-05-07T19:45:03.0327310Z #define __SIZEOF_LONG__ 8 2025-05-07T19:45:03.0327607Z #define __SIZEOF_POINTER__ 8 2025-05-07T19:45:03.0327912Z #define __SIZEOF_PTRDIFF_T__ 8 2025-05-07T19:45:03.0328183Z #define __SIZEOF_SHORT__ 2 2025-05-07T19:45:03.0328471Z #define __SIZEOF_SIZE_T__ 8 2025-05-07T19:45:03.0328738Z #define __SIZEOF_WCHAR_T__ 4 2025-05-07T19:45:03.0329038Z #define __SIZEOF_WINT_T__ 4 2025-05-07T19:45:03.0329305Z #define __SIZE_FMTX__ "lX" 2025-05-07T19:45:03.0329603Z #define __SIZE_FMTo__ "lo" 2025-05-07T19:45:03.0329862Z #define __SIZE_FMTu__ "lu" 2025-05-07T19:45:03.0330156Z #define __SIZE_FMTx__ "lx" 2025-05-07T19:45:03.0330437Z #define __SIZE_MAX__ 18446744073709551615UL 2025-05-07T19:45:03.0330779Z #define __SIZE_TYPE__ long unsigned int 2025-05-07T19:45:03.0331075Z #define __SIZE_WIDTH__ 64 2025-05-07T19:45:03.0331353Z #define __SSE2_MATH__ 1 2025-05-07T19:45:03.0331619Z #define __SSE2__ 1 2025-05-07T19:45:03.0331846Z #define __SSE_MATH__ 1 2025-05-07T19:45:03.0332117Z #define __SSE__ 1 2025-05-07T19:45:03.0332379Z #define __STDCPP_DEFAULT_NEW_ALIGNMENT__ 16UL 2025-05-07T19:45:03.0332734Z #define __STDCPP_THREADS__ 1 2025-05-07T19:45:03.0333006Z #define __STDC_HOSTED__ 1 2025-05-07T19:45:03.0333281Z #define __STDC_UTF_16__ 1 2025-05-07T19:45:03.0333533Z #define __STDC_UTF_32__ 1 2025-05-07T19:45:03.0333818Z #define __STDC__ 1 2025-05-07T19:45:03.0334060Z #define __UINT16_C_SUFFIX__ 2025-05-07T19:45:03.0334351Z #define __UINT16_FMTX__ "hX" 2025-05-07T19:45:03.0334648Z #define __UINT16_FMTo__ "ho" 2025-05-07T19:45:03.0334916Z #define __UINT16_FMTu__ "hu" 2025-05-07T19:45:03.0335341Z #define __UINT16_FMTx__ "hx" 2025-05-07T19:45:03.0335794Z #define __UINT16_MAX__ 65535 2025-05-07T19:45:03.0336134Z #define __UINT16_TYPE__ unsigned short 2025-05-07T19:45:03.0336462Z #define __UINT32_C_SUFFIX__ U 2025-05-07T19:45:03.0336782Z #define __UINT32_FMTX__ "X" 2025-05-07T19:45:03.0337068Z #define __UINT32_FMTo__ "o" 2025-05-07T19:45:03.0337381Z #define __UINT32_FMTu__ "u" 2025-05-07T19:45:03.0337665Z #define __UINT32_FMTx__ "x" 2025-05-07T19:45:03.0337972Z #define __UINT32_MAX__ 4294967295U 2025-05-07T19:45:03.0338332Z #define __UINT32_TYPE__ unsigned int 2025-05-07T19:45:03.0338647Z #define __UINT64_C_SUFFIX__ UL 2025-05-07T19:45:03.0338965Z #define __UINT64_FMTX__ "lX" 2025-05-07T19:45:03.0339251Z #define __UINT64_FMTo__ "lo" 2025-05-07T19:45:03.0339563Z #define __UINT64_FMTu__ "lu" 2025-05-07T19:45:03.0339846Z #define __UINT64_FMTx__ "lx" 2025-05-07T19:45:03.0340170Z #define __UINT64_MAX__ 18446744073709551615UL 2025-05-07T19:45:03.0340518Z #define __UINT64_TYPE__ long unsigned int 2025-05-07T19:45:03.0340881Z #define __UINT8_C_SUFFIX__ 2025-05-07T19:45:03.0341169Z #define __UINT8_FMTX__ "hhX" 2025-05-07T19:45:03.0341480Z #define __UINT8_FMTo__ "hho" 2025-05-07T19:45:03.0341792Z #define __UINT8_FMTu__ "hhu" 2025-05-07T19:45:03.0342077Z #define __UINT8_FMTx__ "hhx" 2025-05-07T19:45:03.0342389Z #define __UINT8_MAX__ 255 2025-05-07T19:45:03.0342675Z #define __UINT8_TYPE__ unsigned char 2025-05-07T19:45:03.0343023Z #define __UINTMAX_C_SUFFIX__ UL 2025-05-07T19:45:03.0343323Z #define __UINTMAX_FMTX__ "lX" 2025-05-07T19:45:03.0346532Z #define __UINTMAX_FMTo__ "lo" 2025-05-07T19:45:03.0346830Z #define __UINTMAX_FMTu__ "lu" 2025-05-07T19:45:03.0347158Z #define __UINTMAX_FMTx__ "lx" 2025-05-07T19:45:03.0347481Z #define __UINTMAX_MAX__ 18446744073709551615UL 2025-05-07T19:45:03.0347876Z #define __UINTMAX_TYPE__ long unsigned int 2025-05-07T19:45:03.0348353Z #define __UINTMAX_WIDTH__ 64 2025-05-07T19:45:03.0348627Z #define __UINTPTR_FMTX__ "lX" 2025-05-07T19:45:03.0348923Z #define __UINTPTR_FMTo__ "lo" 2025-05-07T19:45:03.0349178Z #define __UINTPTR_FMTu__ "lu" 2025-05-07T19:45:03.0349542Z #define __UINTPTR_FMTx__ "lx" 2025-05-07T19:45:03.0349817Z #define __UINTPTR_MAX__ 18446744073709551615UL 2025-05-07T19:45:03.0350144Z #define __UINTPTR_TYPE__ long unsigned int 2025-05-07T19:45:03.0350438Z #define __UINTPTR_WIDTH__ 64 2025-05-07T19:45:03.0350713Z #define __UINT_FAST16_FMTX__ "hX" 2025-05-07T19:45:03.0350982Z #define __UINT_FAST16_FMTo__ "ho" 2025-05-07T19:45:03.0351270Z #define __UINT_FAST16_FMTu__ "hu" 2025-05-07T19:45:03.0351566Z #define __UINT_FAST16_FMTx__ "hx" 2025-05-07T19:45:03.0351827Z #define __UINT_FAST16_MAX__ 65535 2025-05-07T19:45:03.0352124Z #define __UINT_FAST16_TYPE__ unsigned short 2025-05-07T19:45:03.0352420Z #define __UINT_FAST32_FMTX__ "X" 2025-05-07T19:45:03.0352709Z #define __UINT_FAST32_FMTo__ "o" 2025-05-07T19:45:03.0352971Z #define __UINT_FAST32_FMTu__ "u" 2025-05-07T19:45:03.0353247Z #define __UINT_FAST32_FMTx__ "x" 2025-05-07T19:45:03.0353519Z #define __UINT_FAST32_MAX__ 4294967295U 2025-05-07T19:45:03.0353861Z #define __UINT_FAST32_TYPE__ unsigned int 2025-05-07T19:45:03.0354186Z #define __UINT_FAST64_FMTX__ "lX" 2025-05-07T19:45:03.0354500Z #define __UINT_FAST64_FMTo__ "lo" 2025-05-07T19:45:03.0354815Z #define __UINT_FAST64_FMTu__ "lu" 2025-05-07T19:45:03.0355100Z #define __UINT_FAST64_FMTx__ "lx" 2025-05-07T19:45:03.0355437Z #define __UINT_FAST64_MAX__ 18446744073709551615UL 2025-05-07T19:45:03.0355799Z #define __UINT_FAST64_TYPE__ long unsigned int 2025-05-07T19:45:03.0356166Z #define __UINT_FAST8_FMTX__ "hhX" 2025-05-07T19:45:03.0356448Z #define __UINT_FAST8_FMTo__ "hho" 2025-05-07T19:45:03.0356752Z #define __UINT_FAST8_FMTu__ "hhu" 2025-05-07T19:45:03.0357033Z #define __UINT_FAST8_FMTx__ "hhx" 2025-05-07T19:45:03.0357337Z #define __UINT_FAST8_MAX__ 255 2025-05-07T19:45:03.0357650Z #define __UINT_FAST8_TYPE__ unsigned char 2025-05-07T19:45:03.0357962Z #define __UINT_LEAST16_FMTX__ "hX" 2025-05-07T19:45:03.0358279Z #define __UINT_LEAST16_FMTo__ "ho" 2025-05-07T19:45:03.0358566Z #define __UINT_LEAST16_FMTu__ "hu" 2025-05-07T19:45:03.0358891Z #define __UINT_LEAST16_FMTx__ "hx" 2025-05-07T19:45:03.0359175Z #define __UINT_LEAST16_MAX__ 65535 2025-05-07T19:45:03.0359504Z #define __UINT_LEAST16_TYPE__ unsigned short 2025-05-07T19:45:03.0359825Z #define __UINT_LEAST32_FMTX__ "X" 2025-05-07T19:45:03.0360139Z #define __UINT_LEAST32_FMTo__ "o" 2025-05-07T19:45:03.0360426Z #define __UINT_LEAST32_FMTu__ "u" 2025-05-07T19:45:03.0360743Z #define __UINT_LEAST32_FMTx__ "x" 2025-05-07T19:45:03.0361061Z #define __UINT_LEAST32_MAX__ 4294967295U 2025-05-07T19:45:03.0361371Z #define __UINT_LEAST32_TYPE__ unsigned int 2025-05-07T19:45:03.0361711Z #define __UINT_LEAST64_FMTX__ "lX" 2025-05-07T19:45:03.0362003Z #define __UINT_LEAST64_FMTo__ "lo" 2025-05-07T19:45:03.0362322Z #define __UINT_LEAST64_FMTu__ "lu" 2025-05-07T19:45:03.0362612Z #define __UINT_LEAST64_FMTx__ "lx" 2025-05-07T19:45:03.0362962Z #define __UINT_LEAST64_MAX__ 18446744073709551615UL 2025-05-07T19:45:03.0363326Z #define __UINT_LEAST64_TYPE__ long unsigned int 2025-05-07T19:45:03.0363684Z #define __UINT_LEAST8_FMTX__ "hhX" 2025-05-07T19:45:03.0364014Z #define __UINT_LEAST8_FMTo__ "hho" 2025-05-07T19:45:03.0364304Z #define __UINT_LEAST8_FMTu__ "hhu" 2025-05-07T19:45:03.0364618Z #define __UINT_LEAST8_FMTx__ "hhx" 2025-05-07T19:45:03.0364904Z #define __UINT_LEAST8_MAX__ 255 2025-05-07T19:45:03.0365220Z #define __UINT_LEAST8_TYPE__ unsigned char 2025-05-07T19:45:03.0365532Z #define __USER_LABEL_PREFIX__ 2025-05-07T19:45:03.0366174Z #define __VERSION__ "Clang 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4)" 2025-05-07T19:45:03.0366889Z #define __WCHAR_MAX__ 2147483647 2025-05-07T19:45:03.0367198Z #define __WCHAR_TYPE__ int 2025-05-07T19:45:03.0367485Z #define __WCHAR_WIDTH__ 32 2025-05-07T19:45:03.0367743Z #define __WINT_MAX__ 4294967295U 2025-05-07T19:45:03.0368014Z #define __WINT_TYPE__ unsigned int 2025-05-07T19:45:03.0368333Z #define __WINT_UNSIGNED__ 1 2025-05-07T19:45:03.0368620Z #define __WINT_WIDTH__ 32 2025-05-07T19:45:03.0368875Z #define __amd64 1 2025-05-07T19:45:03.0369208Z #define __amd64__ 1 2025-05-07T19:45:03.0369439Z #define __clang__ 1 2025-05-07T19:45:03.0369731Z #define __clang_literal_encoding__ "UTF-8" 2025-05-07T19:45:03.0370037Z #define __clang_major__ 16 2025-05-07T19:45:03.0370316Z #define __clang_minor__ 0 2025-05-07T19:45:03.0370576Z #define __clang_patchlevel__ 6 2025-05-07T19:45:03.0371182Z #define __clang_version__ "16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4)" 2025-05-07T19:45:03.0371857Z #define __clang_wide_literal_encoding__ "UTF-32" 2025-05-07T19:45:03.0372190Z #define __code_model_small__ 1 2025-05-07T19:45:03.0372645Z #define __cplusplus 201703L 2025-05-07T19:45:03.0372925Z #define __cpp_aggregate_bases 201603L 2025-05-07T19:45:03.0373258Z #define __cpp_aggregate_nsdmi 201304L 2025-05-07T19:45:03.0373561Z #define __cpp_alias_templates 200704L 2025-05-07T19:45:03.0373887Z #define __cpp_aligned_new 201606L 2025-05-07T19:45:03.0374177Z #define __cpp_attributes 200809L 2025-05-07T19:45:03.0374496Z #define __cpp_binary_literals 201304L 2025-05-07T19:45:03.0375168Z #define __cpp_capture_star_this 201603L 2025-05-07T19:45:03.0375598Z #define __cpp_constexpr 201603L 2025-05-07T19:45:03.0375949Z #define __cpp_constexpr_in_decltype 201711L 2025-05-07T19:45:03.0376293Z #define __cpp_decltype 200707L 2025-05-07T19:45:03.0376628Z #define __cpp_decltype_auto 201304L 2025-05-07T19:45:03.0376953Z #define __cpp_deduction_guides 201703L 2025-05-07T19:45:03.0377331Z #define __cpp_delegating_constructors 200604L 2025-05-07T19:45:03.0377686Z #define __cpp_digit_separators 201309L 2025-05-07T19:45:03.0378051Z #define __cpp_enumerator_attributes 201411L 2025-05-07T19:45:03.0378388Z #define __cpp_exceptions 199711L 2025-05-07T19:45:03.0378721Z #define __cpp_fold_expressions 201603L 2025-05-07T19:45:03.0379047Z #define __cpp_generic_lambdas 201304L 2025-05-07T19:45:03.0379414Z #define __cpp_guaranteed_copy_elision 201606L 2025-05-07T19:45:03.0379782Z #define __cpp_hex_float 201603L 2025-05-07T19:45:03.0380087Z #define __cpp_if_constexpr 201606L 2025-05-07T19:45:03.0380440Z #define __cpp_impl_destroying_delete 201806L 2025-05-07T19:45:03.0380806Z #define __cpp_inheriting_constructors 201511L 2025-05-07T19:45:03.0381176Z #define __cpp_init_captures 201304L 2025-05-07T19:45:03.0381492Z #define __cpp_initializer_lists 200806L 2025-05-07T19:45:03.0381844Z #define __cpp_inline_variables 201606L 2025-05-07T19:45:03.0382160Z #define __cpp_lambdas 200907L 2025-05-07T19:45:03.0382504Z #define __cpp_named_character_escapes 202207L 2025-05-07T19:45:03.0382881Z #define __cpp_namespace_attributes 201411L 2025-05-07T19:45:03.0383250Z #define __cpp_nested_namespace_definitions 201411L 2025-05-07T19:45:03.0383652Z #define __cpp_noexcept_function_type 201510L 2025-05-07T19:45:03.0384004Z #define __cpp_nontype_template_args 201411L 2025-05-07T19:45:03.0384406Z #define __cpp_nontype_template_parameter_auto 201606L 2025-05-07T19:45:03.0384770Z #define __cpp_nsdmi 200809L 2025-05-07T19:45:03.0385084Z #define __cpp_range_based_for 201603L 2025-05-07T19:45:03.0385410Z #define __cpp_raw_strings 200710L 2025-05-07T19:45:03.0385747Z #define __cpp_ref_qualifiers 200710L 2025-05-07T19:45:03.0386100Z #define __cpp_return_type_deduction 201304L 2025-05-07T19:45:03.0386429Z #define __cpp_rtti 199711L 2025-05-07T19:45:03.0386750Z #define __cpp_rvalue_references 200610L 2025-05-07T19:45:03.0387087Z #define __cpp_static_assert 201411L 2025-05-07T19:45:03.0387440Z #define __cpp_static_call_operator 202207L 2025-05-07T19:45:03.0387955Z #define __cpp_structured_bindings 201606L 2025-05-07T19:45:03.0388422Z #define __cpp_template_auto 201606L 2025-05-07T19:45:03.0388731Z #define __cpp_threadsafe_static_init 200806L 2025-05-07T19:45:03.0389094Z #define __cpp_unicode_characters 200704L 2025-05-07T19:45:03.0389408Z #define __cpp_unicode_literals 200710L 2025-05-07T19:45:03.0389758Z #define __cpp_user_defined_literals 200809L 2025-05-07T19:45:03.0390108Z #define __cpp_variable_templates 201304L 2025-05-07T19:45:03.0390428Z #define __cpp_variadic_templates 200704L 2025-05-07T19:45:03.0390888Z #define __cpp_variadic_using 201611L 2025-05-07T19:45:03.0391181Z #define __gnu_linux__ 1 2025-05-07T19:45:03.0391460Z #define __k8 1 2025-05-07T19:45:03.0391687Z #define __k8__ 1 2025-05-07T19:45:03.0391950Z #define __linux 1 2025-05-07T19:45:03.0392182Z #define __linux__ 1 2025-05-07T19:45:03.0392441Z #define __llvm__ 1 2025-05-07T19:45:03.0392668Z #define __pic__ 2 2025-05-07T19:45:03.0392927Z #define __pie__ 2 2025-05-07T19:45:03.0393195Z #define __private_extern__ extern 2025-05-07T19:45:03.0393520Z #define __seg_fs __attribute__((address_space(257))) 2025-05-07T19:45:03.0393924Z #define __seg_gs __attribute__((address_space(256))) 2025-05-07T19:45:03.0394252Z #define __tune_k8__ 1 2025-05-07T19:45:03.0394513Z #define __unix 1 2025-05-07T19:45:03.0394734Z #define __unix__ 1 2025-05-07T19:45:03.0394995Z #define __x86_64 1 2025-05-07T19:45:03.0395220Z #define __x86_64__ 1 2025-05-07T19:45:03.0395475Z #define linux 1 2025-05-07T19:45:03.0395699Z #define unix 1 2025-05-07T19:45:03.0395856Z 2025-05-07T19:45:03.0983483Z 2025-05-07T19:45:03.0984171Z + conda run -n build_binary c++ --version 2025-05-07T19:45:03.0984885Z 2025-05-07T19:45:04.9290606Z clang version 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4) 2025-05-07T19:45:04.9291337Z Target: x86_64-conda-linux-gnu 2025-05-07T19:45:04.9291644Z Thread model: posix 2025-05-07T19:45:04.9292020Z InstalledDir: /github/home/miniconda/envs/build_binary/bin 2025-05-07T19:45:04.9292739Z Configuration file: /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-clang++.cfg 2025-05-07T19:45:04.9293247Z 2025-05-07T19:45:05.0031293Z 2025-05-07T19:45:05.0032165Z [INFO] Printing the default version of the C standard used by the compiler ... 2025-05-07T19:45:05.0032819Z + conda run -n build_binary cc -dM -E - < /dev/null | grep __STDC_VERSION__ 2025-05-07T19:45:05.0033190Z 2025-05-07T19:45:06.9180085Z #define __STDC_VERSION__ 201710L 2025-05-07T19:45:06.9180632Z 2025-05-07T19:45:06.9181089Z [INFO] Printing the default version of the C++ standard used by the compiler ... 2025-05-07T19:45:06.9181937Z + conda run -n build_binary c++ -dM -E -x c++ - < /dev/null | grep __cplusplus 2025-05-07T19:45:06.9182451Z 2025-05-07T19:45:08.8085052Z #define __cplusplus 201703L 2025-05-07T19:45:08.8085444Z 2025-05-07T19:45:08.8086475Z [INSTALL] Successfully installed C/C++ compilers 2025-05-07T19:45:08.8169121Z ##[group]Run . $PRELUDE; install_build_tools $BUILD_ENV 2025-05-07T19:45:08.8169630Z . $PRELUDE; install_build_tools $BUILD_ENV 2025-05-07T19:45:08.8170629Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:45:08.8170941Z env: 2025-05-07T19:45:08.8171192Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:45:08.8171533Z BUILD_ENV: build_binary 2025-05-07T19:45:08.8171828Z BUILD_TARGET: genai 2025-05-07T19:45:08.8172053Z BUILD_VARIANT: cuda 2025-05-07T19:45:08.8172305Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:45:08.8172556Z ##[endgroup] 2025-05-07T19:45:09.3360558Z ################################################################################ 2025-05-07T19:45:09.3361630Z # Install Build Tools 2025-05-07T19:45:09.3362292Z # 2025-05-07T19:45:09.3375585Z # [2025-05-07T19:45:09.336Z] + install_build_tools build_binary 2025-05-07T19:45:09.3376158Z ################################################################################ 2025-05-07T19:45:09.3376507Z 2025-05-07T19:45:09.3393440Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:45:09.4243439Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:45:09.4254250Z [INSTALL] Installing build tools ... 2025-05-07T19:45:09.4282048Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y auditwheel bazel cmake>=3.30 hypothesis jinja2 make ncurses ninja openblas patchelf rhash scikit-build wheel pyyaml 2025-05-07T19:45:10.1393763Z Channels: 2025-05-07T19:45:10.1394756Z - conda-forge 2025-05-07T19:45:10.1395205Z Platform: linux-64 2025-05-07T19:45:13.2936938Z Collecting package metadata (repodata.json): - \ | / done 2025-05-07T19:45:16.9887105Z Solving environment: \ | / - done 2025-05-07T19:45:17.0458017Z 2025-05-07T19:45:17.0458846Z ## Package Plan ## 2025-05-07T19:45:17.0459542Z 2025-05-07T19:45:17.0460478Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:45:17.0462165Z 2025-05-07T19:45:17.0462740Z added / updated specs: 2025-05-07T19:45:17.0463595Z - auditwheel 2025-05-07T19:45:17.0464205Z - bazel 2025-05-07T19:45:17.0464841Z - cmake[version='>=3.30'] 2025-05-07T19:45:17.0465618Z - hypothesis 2025-05-07T19:45:17.0466219Z - jinja2 2025-05-07T19:45:17.0466813Z - make 2025-05-07T19:45:17.0467354Z - ncurses 2025-05-07T19:45:17.0467942Z - ninja 2025-05-07T19:45:17.0468493Z - openblas 2025-05-07T19:45:17.0469101Z - patchelf 2025-05-07T19:45:17.0469621Z - pyyaml 2025-05-07T19:45:17.0469849Z - rhash 2025-05-07T19:45:17.0470058Z - scikit-build 2025-05-07T19:45:17.0470301Z - wheel 2025-05-07T19:45:17.0470416Z 2025-05-07T19:45:17.0470420Z 2025-05-07T19:45:17.0470546Z The following packages will be downloaded: 2025-05-07T19:45:17.0470791Z 2025-05-07T19:45:17.0470913Z package | build 2025-05-07T19:45:17.0471269Z ---------------------------|----------------- 2025-05-07T19:45:17.0471658Z alsa-lib-1.2.14 | hb9d3cd8_0 553 KB conda-forge 2025-05-07T19:45:17.0472195Z attrs-25.3.0 | pyh71513ae_0 56 KB conda-forge 2025-05-07T19:45:17.0472668Z auditwheel-6.2.0 | pyha804496_1 40 KB conda-forge 2025-05-07T19:45:17.0473092Z bazel-7.5.0 | h96810dc_2 47.4 MB conda-forge 2025-05-07T19:45:17.0473524Z c-ares-1.34.5 | hb9d3cd8_0 202 KB conda-forge 2025-05-07T19:45:17.0473955Z cairo-1.18.0 | hbb29018_2 961 KB conda-forge 2025-05-07T19:45:17.0474352Z click-8.1.8 | pyh707e725_0 83 KB conda-forge 2025-05-07T19:45:17.0475207Z cmake-4.0.2 | h74e3db0_0 19.4 MB conda-forge 2025-05-07T19:45:17.0475682Z distro-1.9.0 | pyhd8ed1ab_1 41 KB conda-forge 2025-05-07T19:45:17.0476589Z exceptiongroup-1.2.2 | pyhd8ed1ab_1 20 KB conda-forge 2025-05-07T19:45:17.0477113Z expat-2.7.0 | h5888daf_0 137 KB conda-forge 2025-05-07T19:45:17.0477619Z font-ttf-dejavu-sans-mono-2.37| hab24e00_0 388 KB conda-forge 2025-05-07T19:45:17.0478219Z font-ttf-inconsolata-3.000 | h77eed37_0 94 KB conda-forge 2025-05-07T19:45:17.0478784Z font-ttf-source-code-pro-2.038| h77eed37_0 684 KB conda-forge 2025-05-07T19:45:17.0479343Z font-ttf-ubuntu-0.83 | h77eed37_3 1.5 MB conda-forge 2025-05-07T19:45:17.0479828Z fontconfig-2.15.0 | h7e30c49_1 259 KB conda-forge 2025-05-07T19:45:17.0480367Z fonts-conda-ecosystem-1 | 0 4 KB conda-forge 2025-05-07T19:45:17.0480903Z fonts-conda-forge-1 | 0 4 KB conda-forge 2025-05-07T19:45:17.0481377Z freetype-2.13.3 | ha770c72_1 168 KB conda-forge 2025-05-07T19:45:17.0481849Z giflib-5.2.2 | hd590300_0 75 KB conda-forge 2025-05-07T19:45:17.0482442Z graphite2-1.3.13 | h59595ed_1003 95 KB conda-forge 2025-05-07T19:45:17.0482936Z harfbuzz-9.0.0 | hfac3d4d_0 1.5 MB conda-forge 2025-05-07T19:45:17.0483410Z hypothesis-6.131.14 | pyha770c72_0 348 KB conda-forge 2025-05-07T19:45:17.0483891Z ijar-7.5.0 | h5888daf_0 114 KB conda-forge 2025-05-07T19:45:17.0484356Z jinja2-3.1.6 | pyhd8ed1ab_0 110 KB conda-forge 2025-05-07T19:45:17.0484803Z keyutils-1.6.1 | h166bdaf_0 115 KB conda-forge 2025-05-07T19:45:17.0485262Z krb5-1.21.3 | h659f571_0 1.3 MB conda-forge 2025-05-07T19:45:17.0485685Z lcms2-2.17 | h717163a_0 242 KB conda-forge 2025-05-07T19:45:17.0486129Z lerc-4.0.0 | h0aef613_1 258 KB conda-forge 2025-05-07T19:45:17.0486636Z libabseil-20250127.1 | cxx17_hbbce691_0 1.3 MB conda-forge 2025-05-07T19:45:17.0487126Z libcups-2.3.3 | h4637d8d_4 4.3 MB conda-forge 2025-05-07T19:45:17.0487824Z libcurl-8.13.0 | h332b0f4_0 428 KB conda-forge 2025-05-07T19:45:17.0488256Z libdeflate-1.23 | h86f0d12_0 71 KB conda-forge 2025-05-07T19:45:17.0488741Z libedit-3.1.20250104 | pl5321h7949ede_0 132 KB conda-forge 2025-05-07T19:45:17.0489179Z libev-4.33 | hd590300_2 110 KB conda-forge 2025-05-07T19:45:17.0489615Z libexpat-2.7.0 | h5888daf_0 73 KB conda-forge 2025-05-07T19:45:17.0490082Z libfreetype-2.13.3 | ha770c72_1 8 KB conda-forge 2025-05-07T19:45:17.0490541Z libfreetype6-2.13.3 | h48d6fc4_1 371 KB conda-forge 2025-05-07T19:45:17.0491026Z libgfortran-15.1.0 | h69a702a_2 34 KB conda-forge 2025-05-07T19:45:17.0491485Z libgfortran5-15.1.0 | hcea5267_2 1.5 MB conda-forge 2025-05-07T19:45:17.0491954Z libglib-2.84.0 | h2ff4ddf_0 3.8 MB conda-forge 2025-05-07T19:45:17.0492374Z libgrpc-1.71.0 | h8e591d7_1 7.6 MB conda-forge 2025-05-07T19:45:17.0492834Z libjpeg-turbo-3.1.0 | hb9d3cd8_0 614 KB conda-forge 2025-05-07T19:45:17.0493299Z liblzma-5.8.1 | hb9d3cd8_1 110 KB conda-forge 2025-05-07T19:45:17.0493732Z liblzma-devel-5.8.1 | hb9d3cd8_1 431 KB conda-forge 2025-05-07T19:45:17.0494206Z libnghttp2-1.64.0 | h161d5f1_0 632 KB conda-forge 2025-05-07T19:45:17.0494670Z libopenblas-0.3.29 |pthreads_h94d23a6_0 5.6 MB conda-forge 2025-05-07T19:45:17.0495143Z libpng-1.6.47 | h943b412_0 282 KB conda-forge 2025-05-07T19:45:17.0496024Z libprotobuf-5.29.3 | h501fc15_1 3.2 MB conda-forge 2025-05-07T19:45:17.0496570Z libre2-11-2024.07.02 | hba17884_3 205 KB conda-forge 2025-05-07T19:45:17.0497071Z libsqlite-3.49.2 | hee588c1_0 895 KB conda-forge 2025-05-07T19:45:17.0497528Z libssh2-1.11.1 | hcf80075_0 298 KB conda-forge 2025-05-07T19:45:17.0497998Z libtiff-4.7.0 | hd9ff511_4 419 KB conda-forge 2025-05-07T19:45:17.0498441Z libuuid-2.38.1 | h0b41bf4_0 33 KB conda-forge 2025-05-07T19:45:17.0498913Z libuv-1.50.0 | hb9d3cd8_0 870 KB conda-forge 2025-05-07T19:45:17.0499396Z libwebp-base-1.5.0 | h851e524_0 420 KB conda-forge 2025-05-07T19:45:17.0499855Z libxcb-1.17.0 | h8a09558_0 387 KB conda-forge 2025-05-07T19:45:17.0500323Z libzlib-1.3.1 | hb9d3cd8_2 60 KB conda-forge 2025-05-07T19:45:17.0500829Z make-4.4.1 | hb9d3cd8_2 501 KB conda-forge 2025-05-07T19:45:17.0501308Z markupsafe-3.0.2 | py313h8060acc_1 24 KB conda-forge 2025-05-07T19:45:17.0501795Z ncurses-6.5 | h2d0b736_3 871 KB conda-forge 2025-05-07T19:45:17.0502223Z ninja-1.12.1 | hff21bea_1 158 KB conda-forge 2025-05-07T19:45:17.0502715Z openblas-0.3.29 |pthreads_h6ec200e_0 5.8 MB conda-forge 2025-05-07T19:45:17.0503192Z openjdk-23.0.1 | h4c11d01_0 181.3 MB conda-forge 2025-05-07T19:45:17.0503677Z packaging-25.0 | pyh29332c3_1 61 KB conda-forge 2025-05-07T19:45:17.0504140Z patchelf-0.18.0 | h3f2d84a_2 133 KB conda-forge 2025-05-07T19:45:17.0504605Z pcre2-10.44 | hc749103_2 934 KB conda-forge 2025-05-07T19:45:17.0505061Z pixman-0.46.0 | h29eaf8c_0 389 KB conda-forge 2025-05-07T19:45:17.0505531Z pthread-stubs-0.4 | hb9d3cd8_1002 8 KB conda-forge 2025-05-07T19:45:17.0506039Z pyelftools-0.32 | pyh707e725_1 146 KB conda-forge 2025-05-07T19:45:17.0506505Z python-3.13.2 |hf636f53_101_cp313 31.7 MB conda-forge 2025-05-07T19:45:17.0506985Z pyyaml-6.0.2 | py313h8060acc_2 201 KB conda-forge 2025-05-07T19:45:17.0507429Z re2-2024.07.02 | h9925aae_3 26 KB conda-forge 2025-05-07T19:45:17.0507883Z rhash-1.4.5 | hb9d3cd8_0 183 KB conda-forge 2025-05-07T19:45:17.0508462Z scikit-build-0.18.1 | pyhae55e72_2 114 KB conda-forge 2025-05-07T19:45:17.0508907Z singlejar-7.5.0 | h0e684df_1 122 KB conda-forge 2025-05-07T19:45:17.0509405Z sortedcontainers-2.4.0 | pyhd8ed1ab_1 28 KB conda-forge 2025-05-07T19:45:17.0509870Z sqlite-3.49.2 | h9eae976_0 840 KB conda-forge 2025-05-07T19:45:17.0510302Z tk-8.6.13 |noxft_h4845f30_101 3.2 MB conda-forge 2025-05-07T19:45:17.0510735Z tomli-2.2.1 | pyhd8ed1ab_1 19 KB conda-forge 2025-05-07T19:45:17.0511149Z wheel-0.45.1 | pyhd8ed1ab_1 61 KB conda-forge 2025-05-07T19:45:17.0511597Z xorg-libice-1.1.2 | hb9d3cd8_0 57 KB conda-forge 2025-05-07T19:45:17.0512031Z xorg-libsm-1.2.6 | he73a12e_0 27 KB conda-forge 2025-05-07T19:45:17.0512490Z xorg-libx11-1.8.12 | h4f16b4b_0 816 KB conda-forge 2025-05-07T19:45:17.0512930Z xorg-libxau-1.0.12 | hb9d3cd8_0 14 KB conda-forge 2025-05-07T19:45:17.0513408Z xorg-libxdmcp-1.1.5 | hb9d3cd8_0 19 KB conda-forge 2025-05-07T19:45:17.0513966Z xorg-libxext-1.3.6 | hb9d3cd8_0 49 KB conda-forge 2025-05-07T19:45:17.0514440Z xorg-libxfixes-6.0.1 | hb9d3cd8_0 19 KB conda-forge 2025-05-07T19:45:17.0514916Z xorg-libxi-1.8.2 | hb9d3cd8_0 46 KB conda-forge 2025-05-07T19:45:17.0515367Z xorg-libxrandr-1.5.4 | hb9d3cd8_0 29 KB conda-forge 2025-05-07T19:45:17.0515877Z xorg-libxrender-0.9.12 | hb9d3cd8_0 32 KB conda-forge 2025-05-07T19:45:17.0516342Z xorg-libxt-1.3.1 | hb9d3cd8_0 371 KB conda-forge 2025-05-07T19:45:17.0516812Z xorg-libxtst-1.2.5 | hb9d3cd8_3 32 KB conda-forge 2025-05-07T19:45:17.0517248Z xz-5.8.1 | hbcc6ac9_1 23 KB conda-forge 2025-05-07T19:45:17.0517658Z xz-gpl-tools-5.8.1 | hbcc6ac9_1 33 KB conda-forge 2025-05-07T19:45:17.0518115Z xz-tools-5.8.1 | hb9d3cd8_1 94 KB conda-forge 2025-05-07T19:45:17.0518595Z yaml-0.2.5 | h7f98852_2 87 KB conda-forge 2025-05-07T19:45:17.0519012Z zlib-1.3.1 | hb9d3cd8_2 90 KB conda-forge 2025-05-07T19:45:17.0519426Z zstd-1.5.7 | hb8e6e7a_2 554 KB conda-forge 2025-05-07T19:45:17.0519814Z ------------------------------------------------------------ 2025-05-07T19:45:17.0520190Z Total: 339.1 MB 2025-05-07T19:45:17.0520406Z 2025-05-07T19:45:17.0520543Z The following NEW packages will be INSTALLED: 2025-05-07T19:45:17.0520795Z 2025-05-07T19:45:17.0520999Z alsa-lib conda-forge/linux-64::alsa-lib-1.2.14-hb9d3cd8_0 2025-05-07T19:45:17.0521435Z attrs conda-forge/noarch::attrs-25.3.0-pyh71513ae_0 2025-05-07T19:45:17.0521916Z auditwheel conda-forge/noarch::auditwheel-6.2.0-pyha804496_1 2025-05-07T19:45:17.0522387Z bazel conda-forge/linux-64::bazel-7.5.0-h96810dc_2 2025-05-07T19:45:17.0522836Z c-ares conda-forge/linux-64::c-ares-1.34.5-hb9d3cd8_0 2025-05-07T19:45:17.0523285Z cairo conda-forge/linux-64::cairo-1.18.0-hbb29018_2 2025-05-07T19:45:17.0523721Z click conda-forge/noarch::click-8.1.8-pyh707e725_0 2025-05-07T19:45:17.0524134Z cmake conda-forge/linux-64::cmake-4.0.2-h74e3db0_0 2025-05-07T19:45:17.0524579Z distro conda-forge/noarch::distro-1.9.0-pyhd8ed1ab_1 2025-05-07T19:45:17.0525073Z exceptiongroup conda-forge/noarch::exceptiongroup-1.2.2-pyhd8ed1ab_1 2025-05-07T19:45:17.0525708Z font-ttf-dejavu-s~ conda-forge/noarch::font-ttf-dejavu-sans-mono-2.37-hab24e00_0 2025-05-07T19:45:17.0526361Z font-ttf-inconsol~ conda-forge/noarch::font-ttf-inconsolata-3.000-h77eed37_0 2025-05-07T19:45:17.0526979Z font-ttf-source-c~ conda-forge/noarch::font-ttf-source-code-pro-2.038-h77eed37_0 2025-05-07T19:45:17.0527584Z font-ttf-ubuntu conda-forge/noarch::font-ttf-ubuntu-0.83-h77eed37_3 2025-05-07T19:45:17.0528108Z fontconfig conda-forge/linux-64::fontconfig-2.15.0-h7e30c49_1 2025-05-07T19:45:17.0528649Z fonts-conda-ecosy~ conda-forge/noarch::fonts-conda-ecosystem-1-0 2025-05-07T19:45:17.0529183Z fonts-conda-forge conda-forge/noarch::fonts-conda-forge-1-0 2025-05-07T19:45:17.0529659Z freetype conda-forge/linux-64::freetype-2.13.3-ha770c72_1 2025-05-07T19:45:17.0530132Z giflib conda-forge/linux-64::giflib-5.2.2-hd590300_0 2025-05-07T19:45:17.0530594Z graphite2 conda-forge/linux-64::graphite2-1.3.13-h59595ed_1003 2025-05-07T19:45:17.0531100Z harfbuzz conda-forge/linux-64::harfbuzz-9.0.0-hfac3d4d_0 2025-05-07T19:45:17.0531602Z hypothesis conda-forge/noarch::hypothesis-6.131.14-pyha770c72_0 2025-05-07T19:45:17.0532065Z ijar conda-forge/linux-64::ijar-7.5.0-h5888daf_0 2025-05-07T19:45:17.0532518Z jinja2 conda-forge/noarch::jinja2-3.1.6-pyhd8ed1ab_0 2025-05-07T19:45:17.0533090Z keyutils conda-forge/linux-64::keyutils-1.6.1-h166bdaf_0 2025-05-07T19:45:17.0533558Z krb5 conda-forge/linux-64::krb5-1.21.3-h659f571_0 2025-05-07T19:45:17.0533997Z lcms2 conda-forge/linux-64::lcms2-2.17-h717163a_0 2025-05-07T19:45:17.0534407Z lerc conda-forge/linux-64::lerc-4.0.0-h0aef613_1 2025-05-07T19:45:17.0534906Z libabseil conda-forge/linux-64::libabseil-20250127.1-cxx17_hbbce691_0 2025-05-07T19:45:17.0535509Z libcups conda-forge/linux-64::libcups-2.3.3-h4637d8d_4 2025-05-07T19:45:17.0536176Z libcurl conda-forge/linux-64::libcurl-8.13.0-h332b0f4_0 2025-05-07T19:45:17.0536695Z libdeflate conda-forge/linux-64::libdeflate-1.23-h86f0d12_0 2025-05-07T19:45:17.0537222Z libedit conda-forge/linux-64::libedit-3.1.20250104-pl5321h7949ede_0 2025-05-07T19:45:17.0537740Z libev conda-forge/linux-64::libev-4.33-hd590300_2 2025-05-07T19:45:17.0538203Z libexpat conda-forge/linux-64::libexpat-2.7.0-h5888daf_0 2025-05-07T19:45:17.0538819Z libfreetype conda-forge/linux-64::libfreetype-2.13.3-ha770c72_1 2025-05-07T19:45:17.0539360Z libfreetype6 conda-forge/linux-64::libfreetype6-2.13.3-h48d6fc4_1 2025-05-07T19:45:17.0539923Z libgfortran conda-forge/linux-64::libgfortran-15.1.0-h69a702a_2 2025-05-07T19:45:17.0540485Z libgfortran5 conda-forge/linux-64::libgfortran5-15.1.0-hcea5267_2 2025-05-07T19:45:17.0540995Z libglib conda-forge/linux-64::libglib-2.84.0-h2ff4ddf_0 2025-05-07T19:45:17.0541498Z libgrpc conda-forge/linux-64::libgrpc-1.71.0-h8e591d7_1 2025-05-07T19:45:17.0542013Z libjpeg-turbo conda-forge/linux-64::libjpeg-turbo-3.1.0-hb9d3cd8_0 2025-05-07T19:45:17.0542561Z liblzma conda-forge/linux-64::liblzma-5.8.1-hb9d3cd8_1 2025-05-07T19:45:17.0543102Z liblzma-devel conda-forge/linux-64::liblzma-devel-5.8.1-hb9d3cd8_1 2025-05-07T19:45:17.0543646Z libnghttp2 conda-forge/linux-64::libnghttp2-1.64.0-h161d5f1_0 2025-05-07T19:45:17.0544244Z libopenblas conda-forge/linux-64::libopenblas-0.3.29-pthreads_h94d23a6_0 2025-05-07T19:45:17.0544784Z libpng conda-forge/linux-64::libpng-1.6.47-h943b412_0 2025-05-07T19:45:17.0545310Z libprotobuf conda-forge/linux-64::libprotobuf-5.29.3-h501fc15_1 2025-05-07T19:45:17.0545860Z libre2-11 conda-forge/linux-64::libre2-11-2024.07.02-hba17884_3 2025-05-07T19:45:17.0546370Z libsqlite conda-forge/linux-64::libsqlite-3.49.2-hee588c1_0 2025-05-07T19:45:17.0546887Z libssh2 conda-forge/linux-64::libssh2-1.11.1-hcf80075_0 2025-05-07T19:45:17.0547359Z libtiff conda-forge/linux-64::libtiff-4.7.0-hd9ff511_4 2025-05-07T19:45:17.0547842Z libuv conda-forge/linux-64::libuv-1.50.0-hb9d3cd8_0 2025-05-07T19:45:17.0548456Z libwebp-base conda-forge/linux-64::libwebp-base-1.5.0-h851e524_0 2025-05-07T19:45:17.0548926Z libxcb conda-forge/linux-64::libxcb-1.17.0-h8a09558_0 2025-05-07T19:45:17.0549366Z make conda-forge/linux-64::make-4.4.1-hb9d3cd8_2 2025-05-07T19:45:17.0550018Z markupsafe conda-forge/linux-64::markupsafe-3.0.2-py313h8060acc_1 2025-05-07T19:45:17.0550534Z ninja conda-forge/linux-64::ninja-1.12.1-hff21bea_1 2025-05-07T19:45:17.0551051Z openblas conda-forge/linux-64::openblas-0.3.29-pthreads_h6ec200e_0 2025-05-07T19:45:17.0551558Z openjdk conda-forge/linux-64::openjdk-23.0.1-h4c11d01_0 2025-05-07T19:45:17.0552221Z packaging conda-forge/noarch::packaging-25.0-pyh29332c3_1 2025-05-07T19:45:17.0552899Z patchelf conda-forge/linux-64::patchelf-0.18.0-h3f2d84a_2 2025-05-07T19:45:17.0553685Z pcre2 conda-forge/linux-64::pcre2-10.44-hc749103_2 2025-05-07T19:45:17.0554162Z pixman conda-forge/linux-64::pixman-0.46.0-h29eaf8c_0 2025-05-07T19:45:17.0554859Z pthread-stubs conda-forge/linux-64::pthread-stubs-0.4-hb9d3cd8_1002 2025-05-07T19:45:17.0555666Z pyelftools conda-forge/noarch::pyelftools-0.32-pyh707e725_1 2025-05-07T19:45:17.0556582Z pyyaml conda-forge/linux-64::pyyaml-6.0.2-py313h8060acc_2 2025-05-07T19:45:17.0557091Z re2 conda-forge/linux-64::re2-2024.07.02-h9925aae_3 2025-05-07T19:45:17.0557550Z rhash conda-forge/linux-64::rhash-1.4.5-hb9d3cd8_0 2025-05-07T19:45:17.0558044Z scikit-build conda-forge/noarch::scikit-build-0.18.1-pyhae55e72_2 2025-05-07T19:45:17.0558599Z singlejar conda-forge/linux-64::singlejar-7.5.0-h0e684df_1 2025-05-07T19:45:17.0559614Z sortedcontainers conda-forge/noarch::sortedcontainers-2.4.0-pyhd8ed1ab_1 2025-05-07T19:45:17.0560232Z tomli conda-forge/noarch::tomli-2.2.1-pyhd8ed1ab_1 2025-05-07T19:45:17.0560723Z xorg-libice conda-forge/linux-64::xorg-libice-1.1.2-hb9d3cd8_0 2025-05-07T19:45:17.0561270Z xorg-libsm conda-forge/linux-64::xorg-libsm-1.2.6-he73a12e_0 2025-05-07T19:45:17.0561910Z xorg-libx11 conda-forge/linux-64::xorg-libx11-1.8.12-h4f16b4b_0 2025-05-07T19:45:17.0562434Z xorg-libxau conda-forge/linux-64::xorg-libxau-1.0.12-hb9d3cd8_0 2025-05-07T19:45:17.0563005Z xorg-libxdmcp conda-forge/linux-64::xorg-libxdmcp-1.1.5-hb9d3cd8_0 2025-05-07T19:45:17.0563547Z xorg-libxext conda-forge/linux-64::xorg-libxext-1.3.6-hb9d3cd8_0 2025-05-07T19:45:17.0564129Z xorg-libxfixes conda-forge/linux-64::xorg-libxfixes-6.0.1-hb9d3cd8_0 2025-05-07T19:45:17.0564692Z xorg-libxi conda-forge/linux-64::xorg-libxi-1.8.2-hb9d3cd8_0 2025-05-07T19:45:17.0565225Z xorg-libxrandr conda-forge/linux-64::xorg-libxrandr-1.5.4-hb9d3cd8_0 2025-05-07T19:45:17.0565834Z xorg-libxrender conda-forge/linux-64::xorg-libxrender-0.9.12-hb9d3cd8_0 2025-05-07T19:45:17.0566382Z xorg-libxt conda-forge/linux-64::xorg-libxt-1.3.1-hb9d3cd8_0 2025-05-07T19:45:17.0566932Z xorg-libxtst conda-forge/linux-64::xorg-libxtst-1.2.5-hb9d3cd8_3 2025-05-07T19:45:17.0567602Z xz-gpl-tools conda-forge/linux-64::xz-gpl-tools-5.8.1-hbcc6ac9_1 2025-05-07T19:45:17.0568090Z xz-tools conda-forge/linux-64::xz-tools-5.8.1-hb9d3cd8_1 2025-05-07T19:45:17.0568552Z yaml conda-forge/linux-64::yaml-0.2.5-h7f98852_2 2025-05-07T19:45:17.0568815Z 2025-05-07T19:45:17.0568946Z The following packages will be UPDATED: 2025-05-07T19:45:17.0569238Z 2025-05-07T19:45:17.0569766Z libuuid pkgs/main::libuuid-1.41.5-h5eee18b_0 --> conda-forge::libuuid-2.38.1-h0b41bf4_0 2025-05-07T19:45:17.0570568Z libzlib 1.2.13-h4ab18f5_6 --> 1.3.1-hb9d3cd8_2 2025-05-07T19:45:17.0571115Z ncurses pkgs/main::ncurses-6.4-h6a678d5_0 --> conda-forge::ncurses-6.5-h2d0b736_3 2025-05-07T19:45:17.0571821Z python pkgs/main::python-3.13.2-hf623796_100~ --> conda-forge::python-3.13.2-hf636f53_101_cp313 2025-05-07T19:45:17.0572528Z sqlite pkgs/main::sqlite-3.45.3-h5eee18b_0 --> conda-forge::sqlite-3.49.2-h9eae976_0 2025-05-07T19:45:17.0573208Z wheel pkgs/main/linux-64::wheel-0.45.1-py31~ --> conda-forge/noarch::wheel-0.45.1-pyhd8ed1ab_1 2025-05-07T19:45:17.0573852Z xz pkgs/main::xz-5.6.4-h5eee18b_1 --> conda-forge::xz-5.8.1-hbcc6ac9_1 2025-05-07T19:45:17.0574321Z zlib 1.2.13-h4ab18f5_6 --> 1.3.1-hb9d3cd8_2 2025-05-07T19:45:17.0575353Z zstd 1.5.6-ha6fb4c9_0 --> 1.5.7-hb8e6e7a_2 2025-05-07T19:45:17.0575974Z 2025-05-07T19:45:17.0576284Z The following packages will be SUPERSEDED by a higher-priority channel: 2025-05-07T19:45:17.0576632Z 2025-05-07T19:45:17.0576903Z expat pkgs/main::expat-2.7.1-h6a678d5_0 --> conda-forge::expat-2.7.0-h5888daf_0 2025-05-07T19:45:17.0577554Z tk pkgs/main::tk-8.6.14-h39e8969_0 --> conda-forge::tk-8.6.13-noxft_h4845f30_101 2025-05-07T19:45:17.0577915Z 2025-05-07T19:45:17.0578146Z 2025-05-07T19:45:17.0578160Z 2025-05-07T19:45:17.0578357Z Downloading and Extracting Packages: ...working... 2025-05-07T19:45:17.0578778Z openjdk-23.0.1 | 181.3 MB | | 0% 2025-05-07T19:45:17.0579064Z 2025-05-07T19:45:17.0579385Z bazel-7.5.0 | 47.4 MB | | 0%  2025-05-07T19:45:17.0579640Z 2025-05-07T19:45:17.0579644Z 2025-05-07T19:45:17.0579869Z python-3.13.2 | 31.7 MB | | 0%  2025-05-07T19:45:17.0580165Z 2025-05-07T19:45:17.0580170Z 2025-05-07T19:45:17.0580173Z 2025-05-07T19:45:17.0580398Z cmake-4.0.2 | 19.4 MB | | 0%  2025-05-07T19:45:17.0580664Z 2025-05-07T19:45:17.0580699Z 2025-05-07T19:45:17.0580703Z 2025-05-07T19:45:17.0583792Z 2025-05-07T19:45:17.0588281Z libgrpc-1.71.0 | 7.6 MB | | 0%  2025-05-07T19:45:17.0589795Z 2025-05-07T19:45:17.0589813Z 2025-05-07T19:45:17.0589834Z 2025-05-07T19:45:17.0589857Z 2025-05-07T19:45:17.0589903Z 2025-05-07T19:45:17.0591128Z openblas-0.3.29 | 5.8 MB | | 0%  2025-05-07T19:45:17.0593337Z 2025-05-07T19:45:17.0593358Z 2025-05-07T19:45:17.0593381Z 2025-05-07T19:45:17.0593403Z 2025-05-07T19:45:17.0593419Z 2025-05-07T19:45:17.0593440Z 2025-05-07T19:45:17.0594748Z libopenblas-0.3.29 | 5.6 MB | | 0%  2025-05-07T19:45:17.0596431Z 2025-05-07T19:45:17.0596452Z 2025-05-07T19:45:17.0596475Z 2025-05-07T19:45:17.0596489Z 2025-05-07T19:45:17.0596504Z 2025-05-07T19:45:17.0596522Z 2025-05-07T19:45:17.0596538Z 2025-05-07T19:45:17.0597449Z libcups-2.3.3 | 4.3 MB | | 0%  2025-05-07T19:45:17.0597759Z 2025-05-07T19:45:17.0597762Z 2025-05-07T19:45:17.0597766Z 2025-05-07T19:45:17.0597769Z 2025-05-07T19:45:17.0597772Z 2025-05-07T19:45:17.0597776Z 2025-05-07T19:45:17.0597780Z 2025-05-07T19:45:17.0597783Z 2025-05-07T19:45:17.0598045Z libglib-2.84.0 | 3.8 MB | | 0%  2025-05-07T19:45:17.0598374Z 2025-05-07T19:45:17.0598378Z 2025-05-07T19:45:17.0598381Z 2025-05-07T19:45:17.0598384Z 2025-05-07T19:45:17.0598388Z 2025-05-07T19:45:17.0598391Z 2025-05-07T19:45:17.0598394Z 2025-05-07T19:45:17.0598398Z 2025-05-07T19:45:17.0598401Z 2025-05-07T19:45:17.0598679Z libprotobuf-5.29.3 | 3.2 MB | | 0%  2025-05-07T19:45:17.0599019Z 2025-05-07T19:45:17.0599022Z 2025-05-07T19:45:17.0599026Z 2025-05-07T19:45:17.0599029Z 2025-05-07T19:45:17.0599032Z 2025-05-07T19:45:17.0599036Z 2025-05-07T19:45:17.0599039Z 2025-05-07T19:45:17.0599043Z 2025-05-07T19:45:17.0599046Z 2025-05-07T19:45:17.0599049Z 2025-05-07T19:45:17.0599288Z tk-8.6.13 | 3.2 MB | | 0%  2025-05-07T19:45:17.0599592Z 2025-05-07T19:45:17.0599596Z 2025-05-07T19:45:17.0599599Z 2025-05-07T19:45:17.0599603Z 2025-05-07T19:45:17.0599606Z 2025-05-07T19:45:17.0599609Z 2025-05-07T19:45:17.0599612Z 2025-05-07T19:45:17.0599620Z 2025-05-07T19:45:17.0599627Z 2025-05-07T19:45:17.0599631Z 2025-05-07T19:45:17.0599635Z 2025-05-07T19:45:17.0599928Z font-ttf-ubuntu-0.83 | 1.5 MB | | 0%  2025-05-07T19:45:17.0600284Z 2025-05-07T19:45:17.0600287Z 2025-05-07T19:45:17.0600291Z 2025-05-07T19:45:17.0600294Z 2025-05-07T19:45:17.0600297Z 2025-05-07T19:45:17.0600301Z 2025-05-07T19:45:17.0600304Z 2025-05-07T19:45:17.0600307Z 2025-05-07T19:45:17.0600310Z 2025-05-07T19:45:17.0600314Z 2025-05-07T19:45:17.0600318Z 2025-05-07T19:45:17.0600321Z 2025-05-07T19:45:17.0601398Z harfbuzz-9.0.0 | 1.5 MB | | 0%  2025-05-07T19:45:17.0601969Z 2025-05-07T19:45:17.0601973Z 2025-05-07T19:45:17.0601979Z 2025-05-07T19:45:17.0601983Z 2025-05-07T19:45:17.0601989Z 2025-05-07T19:45:17.0602019Z 2025-05-07T19:45:17.0602025Z 2025-05-07T19:45:17.0602030Z 2025-05-07T19:45:17.0602036Z 2025-05-07T19:45:17.0602041Z 2025-05-07T19:45:17.0602164Z 2025-05-07T19:45:17.0602179Z 2025-05-07T19:45:17.0602186Z 2025-05-07T19:45:17.0602807Z libgfortran5-15.1.0 | 1.5 MB | | 0%  2025-05-07T19:45:17.0603339Z 2025-05-07T19:45:17.0603348Z 2025-05-07T19:45:17.0603354Z 2025-05-07T19:45:17.0603361Z 2025-05-07T19:45:17.0603369Z 2025-05-07T19:45:17.0603374Z 2025-05-07T19:45:17.0603381Z 2025-05-07T19:45:17.0603389Z 2025-05-07T19:45:17.0603395Z 2025-05-07T19:45:17.0603402Z 2025-05-07T19:45:17.0603410Z 2025-05-07T19:45:17.0603418Z 2025-05-07T19:45:17.0603423Z 2025-05-07T19:45:17.0603432Z 2025-05-07T19:45:17.0603940Z krb5-1.21.3 | 1.3 MB | | 0%  2025-05-07T19:45:17.0604463Z 2025-05-07T19:45:17.0604471Z 2025-05-07T19:45:17.0604479Z 2025-05-07T19:45:17.0604484Z 2025-05-07T19:45:17.0604492Z 2025-05-07T19:45:17.0604501Z 2025-05-07T19:45:17.0604510Z 2025-05-07T19:45:17.0604516Z 2025-05-07T19:45:17.0604552Z 2025-05-07T19:45:17.0604564Z 2025-05-07T19:45:17.0604683Z 2025-05-07T19:45:17.0604690Z 2025-05-07T19:45:17.0604698Z 2025-05-07T19:45:17.0604703Z 2025-05-07T19:45:17.0604710Z 2025-05-07T19:45:17.0605190Z libabseil-20250127.1 | 1.3 MB | | 0%  2025-05-07T19:45:17.0605916Z 2025-05-07T19:45:17.0605924Z 2025-05-07T19:45:17.0605929Z 2025-05-07T19:45:17.0605936Z 2025-05-07T19:45:17.0605943Z 2025-05-07T19:45:17.0605948Z 2025-05-07T19:45:17.0605952Z 2025-05-07T19:45:17.0605958Z 2025-05-07T19:45:17.0605963Z 2025-05-07T19:45:17.0605969Z 2025-05-07T19:45:17.0605979Z 2025-05-07T19:45:17.0605983Z 2025-05-07T19:45:17.0605988Z 2025-05-07T19:45:17.0605992Z 2025-05-07T19:45:17.0605997Z 2025-05-07T19:45:17.0606001Z 2025-05-07T19:45:17.0606590Z cairo-1.18.0 | 961 KB | | 0%  2025-05-07T19:45:17.0607098Z 2025-05-07T19:45:17.0607103Z 2025-05-07T19:45:17.0607110Z 2025-05-07T19:45:17.0607121Z 2025-05-07T19:45:17.0607133Z 2025-05-07T19:45:17.0607139Z 2025-05-07T19:45:17.0607144Z 2025-05-07T19:45:17.0607150Z 2025-05-07T19:45:17.0607156Z 2025-05-07T19:45:17.0607160Z 2025-05-07T19:45:17.0607166Z 2025-05-07T19:45:17.0607171Z 2025-05-07T19:45:17.0607180Z 2025-05-07T19:45:17.0607185Z 2025-05-07T19:45:17.0607217Z 2025-05-07T19:45:17.0607222Z 2025-05-07T19:45:17.0607230Z 2025-05-07T19:45:17.0607810Z pcre2-10.44 | 934 KB | | 0%  2025-05-07T19:45:17.0608174Z 2025-05-07T19:45:17.0608178Z 2025-05-07T19:45:17.0608182Z 2025-05-07T19:45:17.0608186Z 2025-05-07T19:45:17.0608190Z 2025-05-07T19:45:17.0608194Z 2025-05-07T19:45:17.0608225Z 2025-05-07T19:45:17.0608228Z 2025-05-07T19:45:17.0608232Z 2025-05-07T19:45:17.0608235Z 2025-05-07T19:45:17.0608238Z 2025-05-07T19:45:17.0608242Z 2025-05-07T19:45:17.0608245Z 2025-05-07T19:45:17.0608248Z 2025-05-07T19:45:17.0608251Z 2025-05-07T19:45:17.0608255Z 2025-05-07T19:45:17.0608262Z 2025-05-07T19:45:17.0608283Z 2025-05-07T19:45:17.0608604Z libsqlite-3.49.2 | 895 KB | | 0%  2025-05-07T19:45:17.0608963Z 2025-05-07T19:45:17.0608967Z 2025-05-07T19:45:17.0608971Z 2025-05-07T19:45:17.0608975Z 2025-05-07T19:45:17.0608978Z 2025-05-07T19:45:17.0608982Z 2025-05-07T19:45:17.0608985Z 2025-05-07T19:45:17.0608988Z 2025-05-07T19:45:17.0608991Z 2025-05-07T19:45:17.0608995Z 2025-05-07T19:45:17.0608998Z 2025-05-07T19:45:17.0609001Z 2025-05-07T19:45:17.0609005Z 2025-05-07T19:45:17.0609008Z 2025-05-07T19:45:17.0609011Z 2025-05-07T19:45:17.0609016Z 2025-05-07T19:45:17.0609020Z 2025-05-07T19:45:17.0609024Z 2025-05-07T19:45:17.0609027Z 2025-05-07T19:45:17.2714935Z ... (more hidden) ... 2025-05-07T19:45:17.2715442Z 2025-05-07T19:45:17.2715447Z 2025-05-07T19:45:17.2715451Z 2025-05-07T19:45:17.2715455Z 2025-05-07T19:45:17.2897358Z libgrpc-1.71.0 | 7.6 MB | | 0%  2025-05-07T19:45:17.2897721Z 2025-05-07T19:45:17.2897726Z 2025-05-07T19:45:17.2897730Z 2025-05-07T19:45:17.3336105Z cmake-4.0.2 | 19.4 MB | | 0%  2025-05-07T19:45:17.3336949Z 2025-05-07T19:45:17.3336963Z 2025-05-07T19:45:17.3577223Z python-3.13.2 | 31.7 MB | | 0%  2025-05-07T19:45:17.3715376Z openjdk-23.0.1 | 181.3 MB | | 0% 2025-05-07T19:45:17.3716413Z 2025-05-07T19:45:17.3716429Z 2025-05-07T19:45:17.3716441Z 2025-05-07T19:45:17.3716452Z 2025-05-07T19:45:17.3761085Z libgrpc-1.71.0 | 7.6 MB | #########8 | 99%  2025-05-07T19:45:17.3761450Z 2025-05-07T19:45:17.3905251Z bazel-7.5.0 | 47.4 MB | | 0%  2025-05-07T19:45:17.3905549Z 2025-05-07T19:45:17.3905580Z 2025-05-07T19:45:17.3905584Z 2025-05-07T19:45:17.4335857Z cmake-4.0.2 | 19.4 MB | ###9 | 39%  2025-05-07T19:45:17.4336355Z 2025-05-07T19:45:17.4336385Z 2025-05-07T19:45:17.4497463Z python-3.13.2 | 31.7 MB | ###8 | 38%  2025-05-07T19:45:17.4497768Z 2025-05-07T19:45:17.4497774Z 2025-05-07T19:45:17.4497779Z 2025-05-07T19:45:17.4497783Z 2025-05-07T19:45:17.4577030Z libgrpc-1.71.0 | 7.6 MB | ########## | 100%  2025-05-07T19:45:17.4764661Z openjdk-23.0.1 | 181.3 MB | 2 | 2% 2025-05-07T19:45:17.4765219Z 2025-05-07T19:45:17.4905870Z bazel-7.5.0 | 47.4 MB | #2 | 13%  2025-05-07T19:45:17.4906159Z 2025-05-07T19:45:17.4906164Z 2025-05-07T19:45:17.4906169Z 2025-05-07T19:45:17.5038225Z cmake-4.0.2 | 19.4 MB | ######7 | 68%  2025-05-07T19:45:17.5038589Z 2025-05-07T19:45:17.5038687Z 2025-05-07T19:45:17.5038691Z 2025-05-07T19:45:17.5038715Z 2025-05-07T19:45:17.5038718Z 2025-05-07T19:45:17.5582702Z openblas-0.3.29 | 5.8 MB | | 0%  2025-05-07T19:45:17.5766087Z openjdk-23.0.1 | 181.3 MB | 4 | 5% 2025-05-07T19:45:17.5766410Z 2025-05-07T19:45:17.5864100Z bazel-7.5.0 | 47.4 MB | ##1 | 21%  2025-05-07T19:45:17.5864393Z 2025-05-07T19:45:17.5865657Z 2025-05-07T19:45:17.6039319Z python-3.13.2 | 31.7 MB | ###### | 61%  2025-05-07T19:45:17.6039621Z 2025-05-07T19:45:17.6039661Z 2025-05-07T19:45:17.6039664Z 2025-05-07T19:45:17.6039754Z 2025-05-07T19:45:17.6039762Z 2025-05-07T19:45:17.6240114Z openblas-0.3.29 | 5.8 MB | ####### | 70%  2025-05-07T19:45:17.6240552Z 2025-05-07T19:45:17.6240560Z 2025-05-07T19:45:17.6240567Z 2025-05-07T19:45:17.6582308Z cmake-4.0.2 | 19.4 MB | #########2 | 93%  2025-05-07T19:45:17.6766688Z openjdk-23.0.1 | 181.3 MB | 6 | 7% 2025-05-07T19:45:17.6766977Z 2025-05-07T19:45:17.7037341Z bazel-7.5.0 | 47.4 MB | ### | 30%  2025-05-07T19:45:17.7037630Z 2025-05-07T19:45:17.7037635Z 2025-05-07T19:45:17.7301201Z python-3.13.2 | 31.7 MB | #######9 | 80%  2025-05-07T19:45:17.7301545Z 2025-05-07T19:45:17.7301550Z 2025-05-07T19:45:17.7301554Z 2025-05-07T19:45:17.7301557Z 2025-05-07T19:45:17.7301561Z 2025-05-07T19:45:17.7582958Z openblas-0.3.29 | 5.8 MB | ########## | 100%  2025-05-07T19:45:17.7942597Z openjdk-23.0.1 | 181.3 MB | # | 11% 2025-05-07T19:45:17.7943429Z 2025-05-07T19:45:17.7943444Z 2025-05-07T19:45:17.7943455Z 2025-05-07T19:45:17.7943466Z 2025-05-07T19:45:17.7943478Z 2025-05-07T19:45:17.7943489Z 2025-05-07T19:45:17.7987073Z libopenblas-0.3.29 | 5.6 MB | | 0%  2025-05-07T19:45:17.7987517Z 2025-05-07T19:45:17.8585148Z bazel-7.5.0 | 47.4 MB | ###8 | 38%  2025-05-07T19:45:17.8990081Z openjdk-23.0.1 | 181.3 MB | #4 | 14% 2025-05-07T19:45:17.8990593Z 2025-05-07T19:45:17.9584755Z bazel-7.5.0 | 47.4 MB | #####1 | 52%  2025-05-07T19:45:17.9625638Z openjdk-23.0.1 | 181.3 MB | #8 | 19% 2025-05-07T19:45:17.9626088Z 2025-05-07T19:45:17.9626093Z 2025-05-07T19:45:17.9626097Z 2025-05-07T19:45:17.9626100Z 2025-05-07T19:45:17.9626104Z 2025-05-07T19:45:17.9626107Z 2025-05-07T19:45:17.9626451Z libopenblas-0.3.29 | 5.6 MB | ########## | 100%  2025-05-07T19:45:17.9626760Z 2025-05-07T19:45:17.9626764Z 2025-05-07T19:45:17.9626767Z 2025-05-07T19:45:17.9626794Z 2025-05-07T19:45:17.9626798Z 2025-05-07T19:45:17.9626801Z 2025-05-07T19:45:17.9757234Z libopenblas-0.3.29 | 5.6 MB | ########## | 100%  2025-05-07T19:45:17.9757666Z 2025-05-07T19:45:17.9757675Z 2025-05-07T19:45:17.9757681Z 2025-05-07T19:45:18.0036465Z cmake-4.0.2 | 19.4 MB | ########## | 100%  2025-05-07T19:45:18.0036860Z 2025-05-07T19:45:18.0337712Z bazel-7.5.0 | 47.4 MB | ######5 | 65%  2025-05-07T19:45:18.0338033Z 2025-05-07T19:45:18.0338040Z 2025-05-07T19:45:18.0338044Z 2025-05-07T19:45:18.0338049Z 2025-05-07T19:45:18.0338054Z 2025-05-07T19:45:18.0338085Z 2025-05-07T19:45:18.0338337Z 2025-05-07T19:45:18.0587290Z libcups-2.3.3 | 4.3 MB | | 0%  2025-05-07T19:45:18.0698828Z openjdk-23.0.1 | 181.3 MB | ##3 | 24% 2025-05-07T19:45:18.0699138Z 2025-05-07T19:45:18.0699145Z 2025-05-07T19:45:18.0699148Z 2025-05-07T19:45:18.0699153Z 2025-05-07T19:45:18.0699157Z 2025-05-07T19:45:18.0699162Z 2025-05-07T19:45:18.0699191Z 2025-05-07T19:45:18.0699198Z 2025-05-07T19:45:18.1061303Z libglib-2.84.0 | 3.8 MB | | 0%  2025-05-07T19:45:18.1061636Z 2025-05-07T19:45:18.1337887Z bazel-7.5.0 | 47.4 MB | #######6 | 76%  2025-05-07T19:45:18.1338211Z 2025-05-07T19:45:18.1338216Z 2025-05-07T19:45:18.1338220Z 2025-05-07T19:45:18.1338223Z 2025-05-07T19:45:18.1338226Z 2025-05-07T19:45:18.1338231Z 2025-05-07T19:45:18.1338236Z 2025-05-07T19:45:18.1930024Z libcups-2.3.3 | 4.3 MB | ########1 | 82%  2025-05-07T19:45:18.2063929Z openjdk-23.0.1 | 181.3 MB | ##7 | 27% 2025-05-07T19:45:18.2064268Z 2025-05-07T19:45:18.2257503Z bazel-7.5.0 | 47.4 MB | ########8 | 89%  2025-05-07T19:45:18.2257797Z 2025-05-07T19:45:18.2257802Z 2025-05-07T19:45:18.2257806Z 2025-05-07T19:45:18.2257809Z 2025-05-07T19:45:18.2257813Z 2025-05-07T19:45:18.2257816Z 2025-05-07T19:45:18.2257820Z 2025-05-07T19:45:18.2306368Z libcups-2.3.3 | 4.3 MB | ########## | 100%  2025-05-07T19:45:18.2306744Z 2025-05-07T19:45:18.2306750Z 2025-05-07T19:45:18.2306753Z 2025-05-07T19:45:18.2306757Z 2025-05-07T19:45:18.2306761Z 2025-05-07T19:45:18.2306764Z 2025-05-07T19:45:18.2306768Z 2025-05-07T19:45:18.2306771Z 2025-05-07T19:45:18.2307070Z libglib-2.84.0 | 3.8 MB | ########## | 100%  2025-05-07T19:45:18.2307359Z 2025-05-07T19:45:18.2307363Z 2025-05-07T19:45:18.2307366Z 2025-05-07T19:45:18.2307370Z 2025-05-07T19:45:18.2307373Z 2025-05-07T19:45:18.2307392Z 2025-05-07T19:45:18.2307405Z 2025-05-07T19:45:18.2307408Z 2025-05-07T19:45:18.2725057Z libglib-2.84.0 | 3.8 MB | ########## | 100%  2025-05-07T19:45:18.2725486Z 2025-05-07T19:45:18.2725491Z 2025-05-07T19:45:18.2725494Z 2025-05-07T19:45:18.2725498Z 2025-05-07T19:45:18.2725501Z 2025-05-07T19:45:18.2725505Z 2025-05-07T19:45:18.2725508Z 2025-05-07T19:45:18.2725512Z 2025-05-07T19:45:18.2725515Z 2025-05-07T19:45:18.2827095Z libprotobuf-5.29.3 | 3.2 MB | | 0%  2025-05-07T19:45:18.2827447Z 2025-05-07T19:45:18.2827452Z 2025-05-07T19:45:18.2827456Z 2025-05-07T19:45:18.2827460Z 2025-05-07T19:45:18.2827463Z 2025-05-07T19:45:18.2827467Z 2025-05-07T19:45:18.2827472Z 2025-05-07T19:45:18.2827476Z 2025-05-07T19:45:18.2827480Z 2025-05-07T19:45:18.2827505Z 2025-05-07T19:45:18.2905991Z tk-8.6.13 | 3.2 MB | | 0%  2025-05-07T19:45:18.2906311Z 2025-05-07T19:45:18.2911897Z 2025-05-07T19:45:18.2912620Z python-3.13.2 | 31.7 MB | ########## | 100%  2025-05-07T19:45:18.2912887Z 2025-05-07T19:45:18.2912895Z 2025-05-07T19:45:18.2930597Z python-3.13.2 | 31.7 MB | ########## | 100%  2025-05-07T19:45:18.3318559Z openjdk-23.0.1 | 181.3 MB | ###2 | 32% 2025-05-07T19:45:18.3318852Z 2025-05-07T19:45:18.3319012Z 2025-05-07T19:45:18.3319022Z 2025-05-07T19:45:18.3319029Z 2025-05-07T19:45:18.3319071Z 2025-05-07T19:45:18.3319076Z 2025-05-07T19:45:18.3319081Z 2025-05-07T19:45:18.3319086Z 2025-05-07T19:45:18.3319091Z 2025-05-07T19:45:18.3319099Z 2025-05-07T19:45:18.3319104Z 2025-05-07T19:45:18.3881220Z font-ttf-ubuntu-0.83 | 1.5 MB | 1 | 1%  2025-05-07T19:45:18.3881614Z 2025-05-07T19:45:18.3881621Z 2025-05-07T19:45:18.3881626Z 2025-05-07T19:45:18.3881629Z 2025-05-07T19:45:18.3881634Z 2025-05-07T19:45:18.3881639Z 2025-05-07T19:45:18.3881643Z 2025-05-07T19:45:18.3881646Z 2025-05-07T19:45:18.3881677Z 2025-05-07T19:45:18.3882499Z 2025-05-07T19:45:18.3882503Z 2025-05-07T19:45:18.3984164Z font-ttf-ubuntu-0.83 | 1.5 MB | ########## | 100%  2025-05-07T19:45:18.3984561Z 2025-05-07T19:45:18.3984566Z 2025-05-07T19:45:18.3984570Z 2025-05-07T19:45:18.3984573Z 2025-05-07T19:45:18.3984577Z 2025-05-07T19:45:18.3984580Z 2025-05-07T19:45:18.3984584Z 2025-05-07T19:45:18.3984587Z 2025-05-07T19:45:18.3984789Z 2025-05-07T19:45:18.3985868Z libprotobuf-5.29.3 | 3.2 MB | ########## | 100%  2025-05-07T19:45:18.3986205Z 2025-05-07T19:45:18.3986209Z 2025-05-07T19:45:18.3986213Z 2025-05-07T19:45:18.3986216Z 2025-05-07T19:45:18.3986220Z 2025-05-07T19:45:18.3986223Z 2025-05-07T19:45:18.3986231Z 2025-05-07T19:45:18.3986235Z 2025-05-07T19:45:18.3986238Z 2025-05-07T19:45:18.4057715Z libprotobuf-5.29.3 | 3.2 MB | ########## | 100%  2025-05-07T19:45:18.4058088Z 2025-05-07T19:45:18.4058093Z 2025-05-07T19:45:18.4058129Z 2025-05-07T19:45:18.4058143Z 2025-05-07T19:45:18.4058146Z 2025-05-07T19:45:18.4058150Z 2025-05-07T19:45:18.4058153Z 2025-05-07T19:45:18.4058157Z 2025-05-07T19:45:18.4058160Z 2025-05-07T19:45:18.4058164Z 2025-05-07T19:45:18.4058411Z tk-8.6.13 | 3.2 MB | ########## | 100%  2025-05-07T19:45:18.4058710Z 2025-05-07T19:45:18.4058714Z 2025-05-07T19:45:18.4058717Z 2025-05-07T19:45:18.4058721Z 2025-05-07T19:45:18.4058724Z 2025-05-07T19:45:18.4058728Z 2025-05-07T19:45:18.4058731Z 2025-05-07T19:45:18.4058735Z 2025-05-07T19:45:18.4058738Z 2025-05-07T19:45:18.4058742Z 2025-05-07T19:45:18.4080205Z tk-8.6.13 | 3.2 MB | ########## | 100%  2025-05-07T19:45:18.4201965Z openjdk-23.0.1 | 181.3 MB | ###6 | 36% 2025-05-07T19:45:18.4202345Z 2025-05-07T19:45:18.4202386Z 2025-05-07T19:45:18.4202390Z 2025-05-07T19:45:18.4202403Z 2025-05-07T19:45:18.4263610Z libgrpc-1.71.0 | 7.6 MB | ########## | 100%  2025-05-07T19:45:18.4263928Z 2025-05-07T19:45:18.4263933Z 2025-05-07T19:45:18.4263936Z 2025-05-07T19:45:18.4263940Z 2025-05-07T19:45:18.4263944Z 2025-05-07T19:45:18.4263947Z 2025-05-07T19:45:18.4280813Z libopenblas-0.3.29 | 5.6 MB | ########## | 100%  2025-05-07T19:45:18.4281127Z 2025-05-07T19:45:18.4281131Z 2025-05-07T19:45:18.4281134Z 2025-05-07T19:45:18.4281138Z 2025-05-07T19:45:18.4281141Z 2025-05-07T19:45:18.4281144Z 2025-05-07T19:45:18.4281148Z 2025-05-07T19:45:18.4281151Z 2025-05-07T19:45:18.4281154Z 2025-05-07T19:45:18.4281158Z 2025-05-07T19:45:18.4281161Z 2025-05-07T19:45:18.4281267Z 2025-05-07T19:45:18.4574991Z harfbuzz-9.0.0 | 1.5 MB | 1 | 1%  2025-05-07T19:45:18.4575591Z 2025-05-07T19:45:18.4575597Z 2025-05-07T19:45:18.4575603Z 2025-05-07T19:45:18.4575608Z 2025-05-07T19:45:18.4575613Z 2025-05-07T19:45:18.4575618Z 2025-05-07T19:45:18.4575624Z 2025-05-07T19:45:18.4576046Z 2025-05-07T19:45:18.4576063Z 2025-05-07T19:45:18.4576069Z 2025-05-07T19:45:18.4576075Z 2025-05-07T19:45:18.4576080Z 2025-05-07T19:45:18.4576084Z 2025-05-07T19:45:18.4617050Z libgfortran5-15.1.0 | 1.5 MB | 1 | 1%  2025-05-07T19:45:18.4617407Z 2025-05-07T19:45:18.4617412Z 2025-05-07T19:45:18.4617416Z 2025-05-07T19:45:18.4617420Z 2025-05-07T19:45:18.4617423Z 2025-05-07T19:45:18.4617427Z 2025-05-07T19:45:18.4617430Z 2025-05-07T19:45:18.4617433Z 2025-05-07T19:45:18.4617461Z 2025-05-07T19:45:18.4617464Z 2025-05-07T19:45:18.4617468Z 2025-05-07T19:45:18.4617471Z 2025-05-07T19:45:18.4617474Z 2025-05-07T19:45:18.4617478Z 2025-05-07T19:45:18.5063185Z krb5-1.21.3 | 1.3 MB | 1 | 1%  2025-05-07T19:45:18.5063520Z 2025-05-07T19:45:18.5063524Z 2025-05-07T19:45:18.5063555Z 2025-05-07T19:45:18.5063559Z 2025-05-07T19:45:18.5063562Z 2025-05-07T19:45:18.5147403Z openblas-0.3.29 | 5.8 MB | ########## | 100%  2025-05-07T19:45:18.5147960Z 2025-05-07T19:45:18.5147964Z 2025-05-07T19:45:18.5147968Z 2025-05-07T19:45:18.5147972Z 2025-05-07T19:45:18.5147975Z 2025-05-07T19:45:18.5148004Z 2025-05-07T19:45:18.5148007Z 2025-05-07T19:45:18.5148010Z 2025-05-07T19:45:18.5148014Z 2025-05-07T19:45:18.5148017Z 2025-05-07T19:45:18.5148020Z 2025-05-07T19:45:18.5148024Z 2025-05-07T19:45:18.5148027Z 2025-05-07T19:45:18.5148031Z 2025-05-07T19:45:18.5186952Z krb5-1.21.3 | 1.3 MB | ########## | 100%  2025-05-07T19:45:18.5187289Z 2025-05-07T19:45:18.5187294Z 2025-05-07T19:45:18.5187298Z 2025-05-07T19:45:18.5187301Z 2025-05-07T19:45:18.5187305Z 2025-05-07T19:45:18.5187308Z 2025-05-07T19:45:18.5187313Z 2025-05-07T19:45:18.5187317Z 2025-05-07T19:45:18.5187321Z 2025-05-07T19:45:18.5187326Z 2025-05-07T19:45:18.5187330Z 2025-05-07T19:45:18.5187334Z 2025-05-07T19:45:18.5187342Z 2025-05-07T19:45:18.5250294Z libgfortran5-15.1.0 | 1.5 MB | ########## | 100%  2025-05-07T19:45:18.5679884Z openjdk-23.0.1 | 181.3 MB | #### | 40% 2025-05-07T19:45:18.5680187Z 2025-05-07T19:45:18.5680213Z 2025-05-07T19:45:18.5680219Z 2025-05-07T19:45:18.5680223Z 2025-05-07T19:45:18.5680226Z 2025-05-07T19:45:18.5680231Z 2025-05-07T19:45:18.5680234Z 2025-05-07T19:45:18.5680237Z 2025-05-07T19:45:18.5680241Z 2025-05-07T19:45:18.5680244Z 2025-05-07T19:45:18.5680247Z 2025-05-07T19:45:18.5680251Z 2025-05-07T19:45:18.5680254Z 2025-05-07T19:45:18.5680257Z 2025-05-07T19:45:18.5680260Z 2025-05-07T19:45:18.6011075Z libabseil-20250127.1 | 1.3 MB | 1 | 1%  2025-05-07T19:45:18.6011563Z 2025-05-07T19:45:18.6011571Z 2025-05-07T19:45:18.6011578Z 2025-05-07T19:45:18.6011585Z 2025-05-07T19:45:18.6011593Z 2025-05-07T19:45:18.6011597Z 2025-05-07T19:45:18.6011604Z 2025-05-07T19:45:18.6011608Z 2025-05-07T19:45:18.6011613Z 2025-05-07T19:45:18.6011640Z 2025-05-07T19:45:18.6011654Z 2025-05-07T19:45:18.6011658Z 2025-05-07T19:45:18.6011661Z 2025-05-07T19:45:18.6011664Z 2025-05-07T19:45:18.6011668Z 2025-05-07T19:45:18.6015327Z libabseil-20250127.1 | 1.3 MB | ########## | 100%  2025-05-07T19:45:18.6015674Z 2025-05-07T19:45:18.6015678Z 2025-05-07T19:45:18.6015681Z 2025-05-07T19:45:18.6015685Z 2025-05-07T19:45:18.6015688Z 2025-05-07T19:45:18.6015691Z 2025-05-07T19:45:18.6015695Z 2025-05-07T19:45:18.6015698Z 2025-05-07T19:45:18.6015701Z 2025-05-07T19:45:18.6015705Z 2025-05-07T19:45:18.6015735Z 2025-05-07T19:45:18.6015742Z 2025-05-07T19:45:18.6039769Z harfbuzz-9.0.0 | 1.5 MB | ##3 | 24%  2025-05-07T19:45:18.6040105Z 2025-05-07T19:45:18.6040250Z 2025-05-07T19:45:18.6040258Z 2025-05-07T19:45:18.6040263Z 2025-05-07T19:45:18.6040268Z 2025-05-07T19:45:18.6040286Z 2025-05-07T19:45:18.6040291Z 2025-05-07T19:45:18.6040296Z 2025-05-07T19:45:18.6040515Z 2025-05-07T19:45:18.6040532Z 2025-05-07T19:45:18.6040536Z 2025-05-07T19:45:18.6040542Z 2025-05-07T19:45:18.6040549Z 2025-05-07T19:45:18.6040554Z 2025-05-07T19:45:18.6040558Z 2025-05-07T19:45:18.6040563Z 2025-05-07T19:45:18.6255091Z cairo-1.18.0 | 961 KB | 1 | 2%  2025-05-07T19:45:18.6382980Z openjdk-23.0.1 | 181.3 MB | ####4 | 44% 2025-05-07T19:45:18.6383330Z 2025-05-07T19:45:18.6383335Z 2025-05-07T19:45:18.6383339Z 2025-05-07T19:45:18.6383342Z 2025-05-07T19:45:18.6383346Z 2025-05-07T19:45:18.6383350Z 2025-05-07T19:45:18.6383353Z 2025-05-07T19:45:18.6383357Z 2025-05-07T19:45:18.6383362Z 2025-05-07T19:45:18.6383367Z 2025-05-07T19:45:18.6383372Z 2025-05-07T19:45:18.6383376Z 2025-05-07T19:45:18.6383381Z 2025-05-07T19:45:18.6383385Z 2025-05-07T19:45:18.6383390Z 2025-05-07T19:45:18.6383395Z 2025-05-07T19:45:18.6476684Z cairo-1.18.0 | 961 KB | ########## | 100%  2025-05-07T19:45:18.6477264Z 2025-05-07T19:45:18.6477269Z 2025-05-07T19:45:18.6477274Z 2025-05-07T19:45:18.6477277Z 2025-05-07T19:45:18.6477281Z 2025-05-07T19:45:18.6477284Z 2025-05-07T19:45:18.6477288Z 2025-05-07T19:45:18.6525641Z libcups-2.3.3 | 4.3 MB | ########## | 100%  2025-05-07T19:45:18.6526557Z 2025-05-07T19:45:18.6526571Z 2025-05-07T19:45:18.6526582Z 2025-05-07T19:45:18.6526593Z 2025-05-07T19:45:18.6526603Z 2025-05-07T19:45:18.6526613Z 2025-05-07T19:45:18.6526624Z 2025-05-07T19:45:18.6526634Z 2025-05-07T19:45:18.6526644Z 2025-05-07T19:45:18.6526654Z 2025-05-07T19:45:18.6526689Z 2025-05-07T19:45:18.6526700Z 2025-05-07T19:45:18.6650488Z harfbuzz-9.0.0 | 1.5 MB | ########## | 100%  2025-05-07T19:45:18.6650840Z 2025-05-07T19:45:18.6650845Z 2025-05-07T19:45:18.6650849Z 2025-05-07T19:45:18.6650854Z 2025-05-07T19:45:18.6650858Z 2025-05-07T19:45:18.6650889Z 2025-05-07T19:45:18.6650907Z 2025-05-07T19:45:18.6650921Z 2025-05-07T19:45:18.6650925Z 2025-05-07T19:45:18.6650928Z 2025-05-07T19:45:18.6650932Z 2025-05-07T19:45:18.6650935Z 2025-05-07T19:45:18.6650939Z 2025-05-07T19:45:18.6650942Z 2025-05-07T19:45:18.6650946Z 2025-05-07T19:45:18.6650949Z 2025-05-07T19:45:18.6650953Z 2025-05-07T19:45:18.6890500Z pcre2-10.44 | 934 KB | 1 | 2%  2025-05-07T19:45:18.6890836Z 2025-05-07T19:45:18.6899102Z bazel-7.5.0 | 47.4 MB | #########9 | 100%  2025-05-07T19:45:18.6899448Z 2025-05-07T19:45:18.6899453Z 2025-05-07T19:45:18.6899457Z 2025-05-07T19:45:18.6899461Z 2025-05-07T19:45:18.6899464Z 2025-05-07T19:45:18.6899468Z 2025-05-07T19:45:18.6899471Z 2025-05-07T19:45:18.6899475Z 2025-05-07T19:45:18.6899485Z 2025-05-07T19:45:18.6899489Z 2025-05-07T19:45:18.6899492Z 2025-05-07T19:45:18.6899496Z 2025-05-07T19:45:18.6899499Z 2025-05-07T19:45:18.6899503Z 2025-05-07T19:45:18.6899506Z 2025-05-07T19:45:18.6899526Z 2025-05-07T19:45:18.6899539Z 2025-05-07T19:45:18.6899542Z 2025-05-07T19:45:18.6899546Z 2025-05-07T19:45:18.6928050Z ... (more hidden) ... 2025-05-07T19:45:18.6928424Z 2025-05-07T19:45:18.6928432Z 2025-05-07T19:45:18.6928437Z 2025-05-07T19:45:18.6928442Z 2025-05-07T19:45:18.6928448Z 2025-05-07T19:45:18.6928454Z 2025-05-07T19:45:18.6928484Z 2025-05-07T19:45:18.6928489Z 2025-05-07T19:45:18.6928496Z 2025-05-07T19:45:18.6928501Z 2025-05-07T19:45:18.6928507Z 2025-05-07T19:45:18.6928513Z 2025-05-07T19:45:18.6928516Z 2025-05-07T19:45:18.6928520Z 2025-05-07T19:45:18.6928523Z 2025-05-07T19:45:18.6928527Z 2025-05-07T19:45:18.6928530Z 2025-05-07T19:45:18.6928534Z 2025-05-07T19:45:18.7048444Z libsqlite-3.49.2 | 895 KB | 1 | 2%  2025-05-07T19:45:18.7048840Z 2025-05-07T19:45:18.7048845Z 2025-05-07T19:45:18.7048849Z 2025-05-07T19:45:18.7048853Z 2025-05-07T19:45:18.7049114Z 2025-05-07T19:45:18.7049130Z 2025-05-07T19:45:18.7049133Z 2025-05-07T19:45:18.7049137Z 2025-05-07T19:45:18.7049140Z 2025-05-07T19:45:18.7049151Z 2025-05-07T19:45:18.7049155Z 2025-05-07T19:45:18.7049158Z 2025-05-07T19:45:18.7049161Z 2025-05-07T19:45:18.7049165Z 2025-05-07T19:45:18.7049168Z 2025-05-07T19:45:18.7049172Z 2025-05-07T19:45:18.7049175Z 2025-05-07T19:45:18.7241455Z pcre2-10.44 | 934 KB | ########## | 100%  2025-05-07T19:45:18.7241916Z 2025-05-07T19:45:18.7241921Z 2025-05-07T19:45:18.7241925Z 2025-05-07T19:45:18.7241929Z 2025-05-07T19:45:18.7241933Z 2025-05-07T19:45:18.7241937Z 2025-05-07T19:45:18.7241940Z 2025-05-07T19:45:18.7241945Z 2025-05-07T19:45:18.7241971Z 2025-05-07T19:45:18.7241974Z 2025-05-07T19:45:18.7241977Z 2025-05-07T19:45:18.7241981Z 2025-05-07T19:45:18.7241985Z 2025-05-07T19:45:18.7241989Z 2025-05-07T19:45:18.7241992Z 2025-05-07T19:45:18.7241995Z 2025-05-07T19:45:18.7242019Z 2025-05-07T19:45:18.7242023Z 2025-05-07T19:45:18.7242213Z 2025-05-07T19:45:18.7295680Z ... (more hidden) ... 2025-05-07T19:45:18.7296163Z 2025-05-07T19:45:18.7296184Z 2025-05-07T19:45:18.7296191Z 2025-05-07T19:45:18.7296196Z 2025-05-07T19:45:18.7296201Z 2025-05-07T19:45:18.7296206Z 2025-05-07T19:45:18.7296211Z 2025-05-07T19:45:18.7296217Z 2025-05-07T19:45:18.7296227Z 2025-05-07T19:45:18.7296232Z 2025-05-07T19:45:18.7296236Z 2025-05-07T19:45:18.7296241Z 2025-05-07T19:45:18.7296245Z 2025-05-07T19:45:18.7296250Z 2025-05-07T19:45:18.7296293Z 2025-05-07T19:45:18.7296298Z 2025-05-07T19:45:18.7296303Z 2025-05-07T19:45:18.7296307Z 2025-05-07T19:45:18.7331096Z libsqlite-3.49.2 | 895 KB | ########## | 100%  2025-05-07T19:45:18.8346646Z openjdk-23.0.1 | 181.3 MB | ####7 | 48% 2025-05-07T19:45:18.9500528Z openjdk-23.0.1 | 181.3 MB | #####2 | 53% 2025-05-07T19:45:18.9933163Z openjdk-23.0.1 | 181.3 MB | #####6 | 57% 2025-05-07T19:45:18.9933717Z 2025-05-07T19:45:18.9933725Z 2025-05-07T19:45:18.9933730Z 2025-05-07T19:45:18.9933759Z 2025-05-07T19:45:18.9933763Z 2025-05-07T19:45:18.9933766Z 2025-05-07T19:45:18.9933770Z 2025-05-07T19:45:18.9933773Z 2025-05-07T19:45:19.0678540Z libglib-2.84.0 | 3.8 MB | ########## | 100%  2025-05-07T19:45:19.0850467Z openjdk-23.0.1 | 181.3 MB | ###### | 61% 2025-05-07T19:45:19.0851062Z 2025-05-07T19:45:19.0851074Z 2025-05-07T19:45:19.0851083Z 2025-05-07T19:45:19.0851100Z 2025-05-07T19:45:19.0851106Z 2025-05-07T19:45:19.0851110Z 2025-05-07T19:45:19.0851115Z 2025-05-07T19:45:19.0851120Z 2025-05-07T19:45:19.0851125Z 2025-05-07T19:45:19.0851130Z 2025-05-07T19:45:19.0851135Z 2025-05-07T19:45:19.0853168Z font-ttf-ubuntu-0.83 | 1.5 MB | ########## | 100%  2025-05-07T19:45:19.0853544Z 2025-05-07T19:45:19.0853550Z 2025-05-07T19:45:19.0853579Z 2025-05-07T19:45:19.0853597Z 2025-05-07T19:45:19.0853600Z 2025-05-07T19:45:19.0853604Z 2025-05-07T19:45:19.0853608Z 2025-05-07T19:45:19.0853618Z 2025-05-07T19:45:19.0853622Z 2025-05-07T19:45:19.0853626Z 2025-05-07T19:45:19.0855393Z 2025-05-07T19:45:19.1680171Z font-ttf-ubuntu-0.83 | 1.5 MB | ########## | 100%  2025-05-07T19:45:19.2783113Z openjdk-23.0.1 | 181.3 MB | ######4 | 65% 2025-05-07T19:45:19.4029971Z openjdk-23.0.1 | 181.3 MB | ######8 | 69% 2025-05-07T19:45:19.4899907Z openjdk-23.0.1 | 181.3 MB | #######2 | 72% 2025-05-07T19:45:19.4900189Z 2025-05-07T19:45:19.5033977Z bazel-7.5.0 | 47.4 MB | ########## | 100%  2025-05-07T19:45:19.6034277Z openjdk-23.0.1 | 181.3 MB | #######5 | 76% 2025-05-07T19:45:19.7280337Z openjdk-23.0.1 | 181.3 MB | #######9 | 79% 2025-05-07T19:45:19.7689470Z openjdk-23.0.1 | 181.3 MB | ########2 | 83% 2025-05-07T19:45:19.7689867Z 2025-05-07T19:45:19.7690360Z 2025-05-07T19:45:19.7690398Z 2025-05-07T19:45:19.7690405Z 2025-05-07T19:45:19.7690425Z 2025-05-07T19:45:19.7690430Z 2025-05-07T19:45:19.7690436Z 2025-05-07T19:45:19.7690442Z 2025-05-07T19:45:19.7690446Z 2025-05-07T19:45:19.8329974Z libprotobuf-5.29.3 | 3.2 MB | ########## | 100%  2025-05-07T19:45:19.9328086Z openjdk-23.0.1 | 181.3 MB | ########6 | 86% 2025-05-07T19:45:20.0329394Z openjdk-23.0.1 | 181.3 MB | ########9 | 90% 2025-05-07T19:45:20.1329846Z openjdk-23.0.1 | 181.3 MB | #########3 | 94% 2025-05-07T19:45:20.2863314Z openjdk-23.0.1 | 181.3 MB | #########7 | 98% 2025-05-07T19:45:20.2863620Z 2025-05-07T19:45:20.2863644Z 2025-05-07T19:45:20.2863648Z 2025-05-07T19:45:20.2863652Z 2025-05-07T19:45:20.2863657Z 2025-05-07T19:45:20.2863660Z 2025-05-07T19:45:20.2863665Z 2025-05-07T19:45:20.2863668Z 2025-05-07T19:45:20.2863672Z 2025-05-07T19:45:20.2863676Z 2025-05-07T19:45:20.4583004Z tk-8.6.13 | 3.2 MB | ########## | 100%  2025-05-07T19:45:20.4583584Z 2025-05-07T19:45:20.4583606Z 2025-05-07T19:45:20.4583610Z 2025-05-07T19:45:20.4583614Z 2025-05-07T19:45:20.4583617Z 2025-05-07T19:45:20.4583621Z 2025-05-07T19:45:20.4583624Z 2025-05-07T19:45:20.4583627Z 2025-05-07T19:45:20.4583630Z 2025-05-07T19:45:20.4583634Z 2025-05-07T19:45:20.4583637Z 2025-05-07T19:45:20.4583640Z 2025-05-07T19:45:20.4583643Z 2025-05-07T19:45:20.4583646Z 2025-05-07T19:45:20.4585998Z krb5-1.21.3 | 1.3 MB | ########## | 100%  2025-05-07T19:45:20.4586300Z 2025-05-07T19:45:20.4586304Z 2025-05-07T19:45:20.4586307Z 2025-05-07T19:45:20.4586310Z 2025-05-07T19:45:20.4586314Z 2025-05-07T19:45:20.4586317Z 2025-05-07T19:45:20.4586320Z 2025-05-07T19:45:20.4586330Z 2025-05-07T19:45:20.4586333Z 2025-05-07T19:45:20.4586337Z 2025-05-07T19:45:20.4586340Z 2025-05-07T19:45:20.4586344Z 2025-05-07T19:45:20.4586370Z 2025-05-07T19:45:20.4586385Z 2025-05-07T19:45:20.5656510Z krb5-1.21.3 | 1.3 MB | ########## | 100%  2025-05-07T19:45:20.5656894Z 2025-05-07T19:45:20.5656899Z 2025-05-07T19:45:20.5656902Z 2025-05-07T19:45:20.5656906Z 2025-05-07T19:45:20.5656910Z 2025-05-07T19:45:20.5656913Z 2025-05-07T19:45:20.5656916Z 2025-05-07T19:45:20.5656920Z 2025-05-07T19:45:20.5656923Z 2025-05-07T19:45:20.5656926Z 2025-05-07T19:45:20.5656930Z 2025-05-07T19:45:20.5656933Z 2025-05-07T19:45:20.5656937Z 2025-05-07T19:45:20.5661274Z libgfortran5-15.1.0 | 1.5 MB | ########## | 100%  2025-05-07T19:45:20.5661609Z 2025-05-07T19:45:20.5661612Z 2025-05-07T19:45:20.5661616Z 2025-05-07T19:45:20.5661619Z 2025-05-07T19:45:20.5661623Z 2025-05-07T19:45:20.5661626Z 2025-05-07T19:45:20.5661630Z 2025-05-07T19:45:20.5661633Z 2025-05-07T19:45:20.5661636Z 2025-05-07T19:45:20.5661664Z 2025-05-07T19:45:20.5661667Z 2025-05-07T19:45:20.5661670Z 2025-05-07T19:45:20.5662364Z 2025-05-07T19:45:21.0620311Z libgfortran5-15.1.0 | 1.5 MB | ########## | 100%  2025-05-07T19:45:21.0620673Z 2025-05-07T19:45:21.0620679Z 2025-05-07T19:45:21.0620685Z 2025-05-07T19:45:21.0620691Z 2025-05-07T19:45:21.0620696Z 2025-05-07T19:45:21.0620727Z 2025-05-07T19:45:21.0620731Z 2025-05-07T19:45:21.0620736Z 2025-05-07T19:45:21.0620740Z 2025-05-07T19:45:21.0620746Z 2025-05-07T19:45:21.0620751Z 2025-05-07T19:45:21.0620754Z 2025-05-07T19:45:21.0620757Z 2025-05-07T19:45:21.0620760Z 2025-05-07T19:45:21.0623462Z 2025-05-07T19:45:21.0625853Z libabseil-20250127.1 | 1.3 MB | ########## | 100%  2025-05-07T19:45:21.0626232Z 2025-05-07T19:45:21.0626237Z 2025-05-07T19:45:21.0626241Z 2025-05-07T19:45:21.0626246Z 2025-05-07T19:45:21.0626249Z 2025-05-07T19:45:21.0626252Z 2025-05-07T19:45:21.0626256Z 2025-05-07T19:45:21.0626261Z 2025-05-07T19:45:21.0626265Z 2025-05-07T19:45:21.0626271Z 2025-05-07T19:45:21.0626525Z 2025-05-07T19:45:21.0626552Z 2025-05-07T19:45:21.0626557Z 2025-05-07T19:45:21.0626586Z 2025-05-07T19:45:21.0626599Z 2025-05-07T19:45:21.1489576Z libabseil-20250127.1 | 1.3 MB | ########## | 100%  2025-05-07T19:45:21.1489951Z 2025-05-07T19:45:21.1490224Z 2025-05-07T19:45:21.1490233Z 2025-05-07T19:45:21.1490240Z 2025-05-07T19:45:21.1490246Z 2025-05-07T19:45:21.1490251Z 2025-05-07T19:45:21.1490257Z 2025-05-07T19:45:21.1490262Z 2025-05-07T19:45:21.1490268Z 2025-05-07T19:45:21.1490274Z 2025-05-07T19:45:21.1490281Z 2025-05-07T19:45:21.1490287Z 2025-05-07T19:45:21.1490293Z 2025-05-07T19:45:21.1490300Z 2025-05-07T19:45:21.1490305Z 2025-05-07T19:45:21.1490310Z 2025-05-07T19:45:21.1490880Z cairo-1.18.0 | 961 KB | ########## | 100%  2025-05-07T19:45:21.1491215Z 2025-05-07T19:45:21.1491219Z 2025-05-07T19:45:21.1491223Z 2025-05-07T19:45:21.1491226Z 2025-05-07T19:45:21.1491230Z 2025-05-07T19:45:21.1491248Z 2025-05-07T19:45:21.1491461Z 2025-05-07T19:45:21.1491465Z 2025-05-07T19:45:21.1491468Z 2025-05-07T19:45:21.1491472Z 2025-05-07T19:45:21.1491476Z 2025-05-07T19:45:21.1491479Z 2025-05-07T19:45:21.1491483Z 2025-05-07T19:45:21.1491487Z 2025-05-07T19:45:21.1491490Z 2025-05-07T19:45:21.1491519Z 2025-05-07T19:45:21.2646499Z cairo-1.18.0 | 961 KB | ########## | 100%  2025-05-07T19:45:21.2646930Z 2025-05-07T19:45:21.2647150Z 2025-05-07T19:45:21.2647154Z 2025-05-07T19:45:21.2647173Z 2025-05-07T19:45:21.2647177Z 2025-05-07T19:45:21.2647219Z 2025-05-07T19:45:21.2647223Z 2025-05-07T19:45:21.2647228Z 2025-05-07T19:45:21.2647231Z 2025-05-07T19:45:21.2647235Z 2025-05-07T19:45:21.2647239Z 2025-05-07T19:45:21.2647251Z 2025-05-07T19:45:21.2647911Z harfbuzz-9.0.0 | 1.5 MB | ########## | 100%  2025-05-07T19:45:21.2648242Z 2025-05-07T19:45:21.2648247Z 2025-05-07T19:45:21.2648266Z 2025-05-07T19:45:21.2648280Z 2025-05-07T19:45:21.2648284Z 2025-05-07T19:45:21.2648288Z 2025-05-07T19:45:21.2648301Z 2025-05-07T19:45:21.2648305Z 2025-05-07T19:45:21.2648308Z 2025-05-07T19:45:21.2648312Z 2025-05-07T19:45:21.2648315Z 2025-05-07T19:45:21.2648341Z 2025-05-07T19:45:21.4377794Z harfbuzz-9.0.0 | 1.5 MB | ########## | 100%  2025-05-07T19:45:21.4378141Z 2025-05-07T19:45:21.4378145Z 2025-05-07T19:45:21.4630849Z python-3.13.2 | 31.7 MB | ########## | 100%  2025-05-07T19:45:21.4631235Z 2025-05-07T19:45:21.4631469Z 2025-05-07T19:45:21.4631478Z 2025-05-07T19:45:21.4631482Z 2025-05-07T19:45:21.4631487Z 2025-05-07T19:45:21.4631492Z 2025-05-07T19:45:21.4631497Z 2025-05-07T19:45:21.4631501Z 2025-05-07T19:45:21.4631506Z 2025-05-07T19:45:21.4631510Z 2025-05-07T19:45:21.4631514Z 2025-05-07T19:45:21.4631518Z 2025-05-07T19:45:21.4631557Z 2025-05-07T19:45:21.4631562Z 2025-05-07T19:45:21.4631566Z 2025-05-07T19:45:21.4631571Z 2025-05-07T19:45:21.4631607Z 2025-05-07T19:45:21.4635151Z pcre2-10.44 | 934 KB | ########## | 100%  2025-05-07T19:45:21.4635499Z 2025-05-07T19:45:21.4635505Z 2025-05-07T19:45:21.4635509Z 2025-05-07T19:45:21.4635512Z 2025-05-07T19:45:21.4635516Z 2025-05-07T19:45:21.4635520Z 2025-05-07T19:45:21.4635524Z 2025-05-07T19:45:21.4635527Z 2025-05-07T19:45:21.4635531Z 2025-05-07T19:45:21.4635534Z 2025-05-07T19:45:21.4635538Z 2025-05-07T19:45:21.4635542Z 2025-05-07T19:45:21.4635546Z 2025-05-07T19:45:21.4635549Z 2025-05-07T19:45:21.4635552Z 2025-05-07T19:45:21.4635556Z 2025-05-07T19:45:21.4635592Z 2025-05-07T19:45:21.5050012Z pcre2-10.44 | 934 KB | ########## | 100%  2025-05-07T19:45:21.5050495Z 2025-05-07T19:45:21.5050614Z 2025-05-07T19:45:21.5050620Z 2025-05-07T19:45:21.5050624Z 2025-05-07T19:45:21.5050629Z 2025-05-07T19:45:21.5050677Z 2025-05-07T19:45:21.5050683Z 2025-05-07T19:45:21.5051103Z 2025-05-07T19:45:21.5051125Z 2025-05-07T19:45:21.5051133Z 2025-05-07T19:45:21.5051141Z 2025-05-07T19:45:21.5051146Z 2025-05-07T19:45:21.5051153Z 2025-05-07T19:45:21.5051161Z 2025-05-07T19:45:21.5051169Z 2025-05-07T19:45:21.5051174Z 2025-05-07T19:45:21.5051207Z 2025-05-07T19:45:21.5051214Z 2025-05-07T19:45:21.5051808Z libsqlite-3.49.2 | 895 KB | ########## | 100%  2025-05-07T19:45:21.5052178Z 2025-05-07T19:45:21.5052181Z 2025-05-07T19:45:21.5052185Z 2025-05-07T19:45:21.5052189Z 2025-05-07T19:45:21.5052192Z 2025-05-07T19:45:21.5052196Z 2025-05-07T19:45:21.5052199Z 2025-05-07T19:45:21.5052203Z 2025-05-07T19:45:21.5052206Z 2025-05-07T19:45:21.5052210Z 2025-05-07T19:45:21.5052213Z 2025-05-07T19:45:21.5052217Z 2025-05-07T19:45:21.5052220Z 2025-05-07T19:45:21.5052224Z 2025-05-07T19:45:21.5052227Z 2025-05-07T19:45:21.5052230Z 2025-05-07T19:45:21.5052234Z 2025-05-07T19:45:21.5052238Z 2025-05-07T19:45:22.3457790Z libsqlite-3.49.2 | 895 KB | ########## | 100%  2025-05-07T19:45:22.3458446Z 2025-05-07T19:45:22.3458461Z 2025-05-07T19:45:22.3458465Z 2025-05-07T19:45:22.4005740Z cmake-4.0.2 | 19.4 MB | ########## | 100%  2025-05-07T19:45:23.1942297Z openjdk-23.0.1 | 181.3 MB | ########## | 100% 2025-05-07T19:45:23.1942832Z 2025-05-07T19:45:23.1942839Z 2025-05-07T19:45:23.1942845Z 2025-05-07T19:45:23.1942850Z 2025-05-07T19:45:23.1942858Z 2025-05-07T19:45:23.1942865Z 2025-05-07T19:45:23.1942870Z 2025-05-07T19:45:23.1942876Z 2025-05-07T19:45:23.1942880Z 2025-05-07T19:45:23.1942908Z 2025-05-07T19:45:23.1942914Z 2025-05-07T19:45:23.1942920Z 2025-05-07T19:45:23.1942926Z 2025-05-07T19:45:23.1942930Z 2025-05-07T19:45:23.1942934Z 2025-05-07T19:45:23.1942937Z 2025-05-07T19:45:23.1942941Z 2025-05-07T19:45:23.1942946Z 2025-05-07T19:45:23.1942950Z 2025-05-07T19:45:23.1943342Z ... (more hidden) ... 2025-05-07T19:45:23.1943687Z 2025-05-07T19:45:23.1943691Z 2025-05-07T19:45:23.1943695Z 2025-05-07T19:45:23.1943698Z 2025-05-07T19:45:23.1943702Z 2025-05-07T19:45:23.1943705Z 2025-05-07T19:45:23.1943708Z 2025-05-07T19:45:23.1943712Z 2025-05-07T19:45:23.1943715Z 2025-05-07T19:45:23.1943718Z 2025-05-07T19:45:23.1943722Z 2025-05-07T19:45:23.1943725Z 2025-05-07T19:45:23.1943729Z 2025-05-07T19:45:23.1943732Z 2025-05-07T19:45:23.1943735Z 2025-05-07T19:45:23.1943739Z 2025-05-07T19:45:23.1943742Z 2025-05-07T19:45:23.1943746Z 2025-05-07T19:45:23.1943749Z 2025-05-07T19:45:23.9738180Z ... (more hidden) ... 2025-05-07T19:45:23.9738564Z 2025-05-07T19:45:24.6188610Z bazel-7.5.0 | 47.4 MB | ########## | 100%  2025-05-07T19:45:24.6203153Z openjdk-23.0.1 | 181.3 MB | ########## | 100% 2025-05-07T19:45:24.6204654Z 2025-05-07T19:45:24.6204677Z 2025-05-07T19:45:24.6204699Z 2025-05-07T19:45:24.6204781Z 2025-05-07T19:45:24.6204827Z 2025-05-07T19:45:24.6204844Z 2025-05-07T19:45:24.6204858Z 2025-05-07T19:45:24.6204873Z 2025-05-07T19:45:24.6204888Z 2025-05-07T19:45:24.6204904Z 2025-05-07T19:45:24.6204922Z 2025-05-07T19:45:24.6204937Z 2025-05-07T19:45:24.6204954Z 2025-05-07T19:45:24.6204976Z 2025-05-07T19:45:24.6205032Z 2025-05-07T19:45:24.6205049Z 2025-05-07T19:45:24.6205064Z 2025-05-07T19:45:24.6205080Z 2025-05-07T19:45:24.6205094Z 2025-05-07T19:45:24.6205479Z 2025-05-07T19:45:24.6207278Z  2025-05-07T19:45:24.6208935Z 2025-05-07T19:45:24.6209927Z 2025-05-07T19:45:24.6210796Z  2025-05-07T19:45:24.6211095Z 2025-05-07T19:45:24.6211101Z 2025-05-07T19:45:24.6211427Z  2025-05-07T19:45:24.6211850Z 2025-05-07T19:45:24.6211856Z 2025-05-07T19:45:24.6212320Z 2025-05-07T19:45:24.6212708Z  2025-05-07T19:45:24.6213127Z 2025-05-07T19:45:24.6213136Z 2025-05-07T19:45:24.6213141Z 2025-05-07T19:45:24.6213150Z 2025-05-07T19:45:24.6213485Z  2025-05-07T19:45:24.6213819Z 2025-05-07T19:45:24.6213824Z 2025-05-07T19:45:24.6213829Z 2025-05-07T19:45:24.6213836Z 2025-05-07T19:45:24.6213841Z 2025-05-07T19:45:24.6214203Z  2025-05-07T19:45:24.6214577Z 2025-05-07T19:45:24.6214585Z 2025-05-07T19:45:24.6214591Z 2025-05-07T19:45:24.6214596Z 2025-05-07T19:45:24.6214600Z 2025-05-07T19:45:24.6214606Z 2025-05-07T19:45:24.6214893Z  2025-05-07T19:45:24.6215448Z 2025-05-07T19:45:24.6215457Z 2025-05-07T19:45:24.6215462Z 2025-05-07T19:45:24.6215468Z 2025-05-07T19:45:24.6215473Z 2025-05-07T19:45:24.6215484Z 2025-05-07T19:45:24.6215489Z 2025-05-07T19:45:24.6216001Z  2025-05-07T19:45:24.6216466Z 2025-05-07T19:45:24.6216472Z 2025-05-07T19:45:24.6216480Z 2025-05-07T19:45:24.6216489Z 2025-05-07T19:45:24.6216497Z 2025-05-07T19:45:24.6216502Z 2025-05-07T19:45:24.6216510Z 2025-05-07T19:45:24.6216518Z 2025-05-07T19:45:24.6216861Z  2025-05-07T19:45:24.6217318Z 2025-05-07T19:45:24.6217361Z 2025-05-07T19:45:24.6217369Z 2025-05-07T19:45:24.6217375Z 2025-05-07T19:45:24.6217382Z 2025-05-07T19:45:24.6217387Z 2025-05-07T19:45:24.6217392Z 2025-05-07T19:45:24.6217398Z 2025-05-07T19:45:24.6217403Z 2025-05-07T19:45:24.6217649Z  2025-05-07T19:45:24.6217891Z 2025-05-07T19:45:24.6217894Z 2025-05-07T19:45:24.6217925Z 2025-05-07T19:45:24.6217929Z 2025-05-07T19:45:24.6217932Z 2025-05-07T19:45:24.6217942Z 2025-05-07T19:45:24.6217952Z 2025-05-07T19:45:24.6217955Z 2025-05-07T19:45:24.6217959Z 2025-05-07T19:45:24.6217963Z 2025-05-07T19:45:24.6218174Z  2025-05-07T19:45:24.6218462Z 2025-05-07T19:45:24.6218468Z 2025-05-07T19:45:24.6218511Z 2025-05-07T19:45:24.6218515Z 2025-05-07T19:45:24.6218520Z 2025-05-07T19:45:24.6218526Z 2025-05-07T19:45:24.6218531Z 2025-05-07T19:45:24.6218536Z 2025-05-07T19:45:24.6218543Z 2025-05-07T19:45:24.6218548Z 2025-05-07T19:45:24.6218554Z 2025-05-07T19:45:24.6218779Z  2025-05-07T19:45:24.6219073Z 2025-05-07T19:45:24.6219110Z 2025-05-07T19:45:24.6219114Z 2025-05-07T19:45:24.6219117Z 2025-05-07T19:45:24.6219121Z 2025-05-07T19:45:24.6219124Z 2025-05-07T19:45:24.6219128Z 2025-05-07T19:45:24.6219132Z 2025-05-07T19:45:24.6219135Z 2025-05-07T19:45:24.6219138Z 2025-05-07T19:45:24.6219148Z 2025-05-07T19:45:24.6219151Z 2025-05-07T19:45:24.6219424Z  2025-05-07T19:45:24.6219701Z 2025-05-07T19:45:24.6219705Z 2025-05-07T19:45:24.6219709Z 2025-05-07T19:45:24.6219712Z 2025-05-07T19:45:24.6219716Z 2025-05-07T19:45:24.6219720Z 2025-05-07T19:45:24.6219723Z 2025-05-07T19:45:24.6219727Z 2025-05-07T19:45:24.6219730Z 2025-05-07T19:45:24.6219733Z 2025-05-07T19:45:24.6219737Z 2025-05-07T19:45:24.6219740Z 2025-05-07T19:45:24.6219744Z 2025-05-07T19:45:24.6220014Z  2025-05-07T19:45:24.6220377Z 2025-05-07T19:45:24.6220382Z 2025-05-07T19:45:24.6220389Z 2025-05-07T19:45:24.6220397Z 2025-05-07T19:45:24.6220403Z 2025-05-07T19:45:24.6220411Z 2025-05-07T19:45:24.6220419Z 2025-05-07T19:45:24.6220428Z 2025-05-07T19:45:24.6220434Z 2025-05-07T19:45:24.6220442Z 2025-05-07T19:45:24.6220446Z 2025-05-07T19:45:24.6220451Z 2025-05-07T19:45:24.6220584Z 2025-05-07T19:45:24.6220601Z 2025-05-07T19:45:24.6221027Z  2025-05-07T19:45:24.6221440Z 2025-05-07T19:45:24.6221445Z 2025-05-07T19:45:24.6221450Z 2025-05-07T19:45:24.6221458Z 2025-05-07T19:45:24.6221464Z 2025-05-07T19:45:24.6221472Z 2025-05-07T19:45:24.6221479Z 2025-05-07T19:45:24.6221487Z 2025-05-07T19:45:24.6221493Z 2025-05-07T19:45:24.6221500Z 2025-05-07T19:45:24.6221508Z 2025-05-07T19:45:24.6221514Z 2025-05-07T19:45:24.6221522Z 2025-05-07T19:45:24.6221529Z 2025-05-07T19:45:24.6221537Z 2025-05-07T19:45:24.6221950Z  2025-05-07T19:45:24.6222362Z 2025-05-07T19:45:24.6222367Z 2025-05-07T19:45:24.6222372Z 2025-05-07T19:45:24.6222380Z 2025-05-07T19:45:24.6222389Z 2025-05-07T19:45:24.6222393Z 2025-05-07T19:45:24.6222401Z 2025-05-07T19:45:24.6222407Z 2025-05-07T19:45:24.6222422Z 2025-05-07T19:45:24.6222555Z 2025-05-07T19:45:24.6222560Z 2025-05-07T19:45:24.6222565Z 2025-05-07T19:45:24.6222570Z 2025-05-07T19:45:24.6222575Z 2025-05-07T19:45:24.6222580Z 2025-05-07T19:45:24.6222584Z 2025-05-07T19:45:24.6223227Z  2025-05-07T19:45:24.6223570Z 2025-05-07T19:45:24.6223575Z 2025-05-07T19:45:24.6223618Z 2025-05-07T19:45:24.6223623Z 2025-05-07T19:45:24.6223628Z 2025-05-07T19:45:24.6223633Z 2025-05-07T19:45:24.6223638Z 2025-05-07T19:45:24.6223643Z 2025-05-07T19:45:24.6223648Z 2025-05-07T19:45:24.6223657Z 2025-05-07T19:45:24.6223663Z 2025-05-07T19:45:24.6223670Z 2025-05-07T19:45:24.6223678Z 2025-05-07T19:45:24.6223686Z 2025-05-07T19:45:24.6223692Z 2025-05-07T19:45:24.6223699Z 2025-05-07T19:45:24.6223707Z 2025-05-07T19:45:24.6224167Z  2025-05-07T19:45:24.6224574Z 2025-05-07T19:45:24.6224581Z 2025-05-07T19:45:24.6224590Z 2025-05-07T19:45:24.6224594Z 2025-05-07T19:45:24.6224598Z 2025-05-07T19:45:24.6224602Z 2025-05-07T19:45:24.6224605Z 2025-05-07T19:45:24.6224609Z 2025-05-07T19:45:24.6224613Z 2025-05-07T19:45:24.6224616Z 2025-05-07T19:45:24.6224619Z 2025-05-07T19:45:24.6224623Z 2025-05-07T19:45:24.6224626Z 2025-05-07T19:45:24.6224630Z 2025-05-07T19:45:24.6224633Z 2025-05-07T19:45:24.6224636Z 2025-05-07T19:45:24.6224640Z 2025-05-07T19:45:24.6224644Z 2025-05-07T19:45:24.6224917Z  2025-05-07T19:45:24.6225182Z 2025-05-07T19:45:24.6225185Z 2025-05-07T19:45:24.6225300Z  2025-05-07T19:45:24.6225439Z 2025-05-07T19:45:24.6225443Z 2025-05-07T19:45:24.6225558Z  2025-05-07T19:45:24.6225689Z 2025-05-07T19:45:24.6225693Z 2025-05-07T19:45:24.6225697Z 2025-05-07T19:45:24.6225845Z  2025-05-07T19:45:24.6225973Z 2025-05-07T19:45:24.6225980Z 2025-05-07T19:45:24.6225983Z 2025-05-07T19:45:24.6225990Z 2025-05-07T19:45:24.6226111Z  2025-05-07T19:45:24.6226268Z 2025-05-07T19:45:24.6226271Z 2025-05-07T19:45:24.6226275Z 2025-05-07T19:45:24.6226278Z 2025-05-07T19:45:24.6226282Z 2025-05-07T19:45:24.6226401Z  2025-05-07T19:45:24.6226545Z 2025-05-07T19:45:24.6226549Z 2025-05-07T19:45:24.6226574Z 2025-05-07T19:45:24.6226578Z 2025-05-07T19:45:24.6226581Z 2025-05-07T19:45:24.6226585Z 2025-05-07T19:45:24.6226707Z  2025-05-07T19:45:24.6226854Z 2025-05-07T19:45:24.6226858Z 2025-05-07T19:45:24.6226862Z 2025-05-07T19:45:24.6226865Z 2025-05-07T19:45:24.6226870Z 2025-05-07T19:45:24.6226874Z 2025-05-07T19:45:24.6226879Z 2025-05-07T19:45:24.6227041Z  2025-05-07T19:45:24.6227217Z 2025-05-07T19:45:24.6227223Z 2025-05-07T19:45:24.6227229Z 2025-05-07T19:45:24.6227235Z 2025-05-07T19:45:24.6227240Z 2025-05-07T19:45:24.6227244Z 2025-05-07T19:45:24.6227249Z 2025-05-07T19:45:24.6228413Z 2025-05-07T19:45:24.6228602Z  2025-05-07T19:45:24.6228778Z 2025-05-07T19:45:24.6228782Z 2025-05-07T19:45:24.6228785Z 2025-05-07T19:45:24.6228789Z 2025-05-07T19:45:24.6228792Z 2025-05-07T19:45:24.6228797Z 2025-05-07T19:45:24.6228800Z 2025-05-07T19:45:24.6228803Z 2025-05-07T19:45:24.6228806Z 2025-05-07T19:45:24.6228977Z  2025-05-07T19:45:24.6229153Z 2025-05-07T19:45:24.6229157Z 2025-05-07T19:45:24.6229160Z 2025-05-07T19:45:24.6229164Z 2025-05-07T19:45:24.6229167Z 2025-05-07T19:45:24.6229171Z 2025-05-07T19:45:24.6229174Z 2025-05-07T19:45:24.6229178Z 2025-05-07T19:45:24.6229181Z 2025-05-07T19:45:24.6229185Z 2025-05-07T19:45:24.6229329Z  2025-05-07T19:45:24.6229533Z 2025-05-07T19:45:24.6229537Z 2025-05-07T19:45:24.6229541Z 2025-05-07T19:45:24.6229544Z 2025-05-07T19:45:24.6229548Z 2025-05-07T19:45:24.6229551Z 2025-05-07T19:45:24.6229555Z 2025-05-07T19:45:24.6229558Z 2025-05-07T19:45:24.6229562Z 2025-05-07T19:45:24.6229569Z 2025-05-07T19:45:24.6229650Z 2025-05-07T19:45:24.6229827Z  2025-05-07T19:45:24.6230025Z 2025-05-07T19:45:24.6230029Z 2025-05-07T19:45:24.6230033Z 2025-05-07T19:45:24.6230036Z 2025-05-07T19:45:24.6230040Z 2025-05-07T19:45:24.6230044Z 2025-05-07T19:45:24.6230047Z 2025-05-07T19:45:24.6230050Z 2025-05-07T19:45:24.6230054Z 2025-05-07T19:45:24.6230057Z 2025-05-07T19:45:24.6230061Z 2025-05-07T19:45:24.6230064Z 2025-05-07T19:45:24.6230236Z  2025-05-07T19:45:24.6230436Z 2025-05-07T19:45:24.6230440Z 2025-05-07T19:45:24.6230444Z 2025-05-07T19:45:24.6230447Z 2025-05-07T19:45:24.6230451Z 2025-05-07T19:45:24.6230454Z 2025-05-07T19:45:24.6230457Z 2025-05-07T19:45:24.6230461Z 2025-05-07T19:45:24.6230464Z 2025-05-07T19:45:24.6230467Z 2025-05-07T19:45:24.6230471Z 2025-05-07T19:45:24.6230474Z 2025-05-07T19:45:24.6230478Z 2025-05-07T19:45:24.6230656Z  2025-05-07T19:45:24.6230872Z 2025-05-07T19:45:24.6230880Z 2025-05-07T19:45:24.6230884Z 2025-05-07T19:45:24.6230887Z 2025-05-07T19:45:24.6230891Z 2025-05-07T19:45:24.6230894Z 2025-05-07T19:45:24.6230897Z 2025-05-07T19:45:24.6230901Z 2025-05-07T19:45:24.6230905Z 2025-05-07T19:45:24.6230908Z 2025-05-07T19:45:24.6230911Z 2025-05-07T19:45:24.6230915Z 2025-05-07T19:45:24.6230918Z 2025-05-07T19:45:24.6230944Z 2025-05-07T19:45:24.6231097Z  2025-05-07T19:45:24.6231309Z 2025-05-07T19:45:24.6231312Z 2025-05-07T19:45:24.6231316Z 2025-05-07T19:45:24.6231319Z 2025-05-07T19:45:24.6231323Z 2025-05-07T19:45:24.6231326Z 2025-05-07T19:45:24.6231329Z 2025-05-07T19:45:24.6231333Z 2025-05-07T19:45:24.6231336Z 2025-05-07T19:45:24.6231340Z 2025-05-07T19:45:24.6231369Z 2025-05-07T19:45:24.6231372Z 2025-05-07T19:45:24.6231375Z 2025-05-07T19:45:24.6231379Z 2025-05-07T19:45:24.6231382Z 2025-05-07T19:45:24.6231541Z  2025-05-07T19:45:24.6231760Z 2025-05-07T19:45:24.6231768Z 2025-05-07T19:45:24.6231775Z 2025-05-07T19:45:24.6231778Z 2025-05-07T19:45:24.6231782Z 2025-05-07T19:45:24.6231808Z 2025-05-07T19:45:24.6231811Z 2025-05-07T19:45:24.6231815Z 2025-05-07T19:45:24.6231818Z 2025-05-07T19:45:24.6231821Z 2025-05-07T19:45:24.6231825Z 2025-05-07T19:45:24.6231828Z 2025-05-07T19:45:24.6231832Z 2025-05-07T19:45:24.6231836Z 2025-05-07T19:45:24.6231839Z 2025-05-07T19:45:24.6231842Z 2025-05-07T19:45:24.6232007Z  2025-05-07T19:45:24.6232257Z 2025-05-07T19:45:24.6232261Z 2025-05-07T19:45:24.6232265Z 2025-05-07T19:45:24.6232268Z 2025-05-07T19:45:24.6232272Z 2025-05-07T19:45:24.6232275Z 2025-05-07T19:45:24.6232278Z 2025-05-07T19:45:24.6232282Z 2025-05-07T19:45:24.6232285Z 2025-05-07T19:45:24.6232289Z 2025-05-07T19:45:24.6232292Z 2025-05-07T19:45:24.6232296Z 2025-05-07T19:45:24.6232299Z 2025-05-07T19:45:24.6232303Z 2025-05-07T19:45:24.6232306Z 2025-05-07T19:45:24.6232310Z 2025-05-07T19:45:24.6232377Z 2025-05-07T19:45:24.6232552Z  2025-05-07T19:45:24.6232807Z 2025-05-07T19:45:24.6232811Z 2025-05-07T19:45:24.6232814Z 2025-05-07T19:45:24.6232817Z 2025-05-07T19:45:24.6232821Z 2025-05-07T19:45:24.6232824Z 2025-05-07T19:45:24.6232828Z 2025-05-07T19:45:24.6232831Z 2025-05-07T19:45:24.6232835Z 2025-05-07T19:45:24.6232838Z 2025-05-07T19:45:24.6232842Z 2025-05-07T19:45:24.6232845Z 2025-05-07T19:45:24.6232849Z 2025-05-07T19:45:24.6232852Z 2025-05-07T19:45:24.6232856Z 2025-05-07T19:45:24.6232859Z 2025-05-07T19:45:24.6232862Z 2025-05-07T19:45:24.6232888Z 2025-05-07T19:45:24.6233066Z  2025-05-07T19:45:24.6233295Z 2025-05-07T19:45:24.6233299Z 2025-05-07T19:45:24.6233403Z  2025-05-07T19:45:24.6233544Z 2025-05-07T19:45:24.6233548Z 2025-05-07T19:45:24.6233654Z  2025-05-07T19:45:24.6233778Z 2025-05-07T19:45:24.6233781Z 2025-05-07T19:45:24.6233785Z 2025-05-07T19:45:24.6256488Z  2025-05-07T19:45:24.6256976Z 2025-05-07T19:45:24.6256981Z 2025-05-07T19:45:24.6256985Z 2025-05-07T19:45:24.6256989Z 2025-05-07T19:45:24.6257238Z  2025-05-07T19:45:24.6257385Z 2025-05-07T19:45:24.6257389Z 2025-05-07T19:45:24.6257392Z 2025-05-07T19:45:24.6257396Z 2025-05-07T19:45:24.6257400Z 2025-05-07T19:45:24.6257533Z  2025-05-07T19:45:24.6257706Z 2025-05-07T19:45:24.6257710Z 2025-05-07T19:45:24.6257713Z 2025-05-07T19:45:24.6257717Z 2025-05-07T19:45:24.6257720Z 2025-05-07T19:45:24.6257724Z 2025-05-07T19:45:24.6257840Z  2025-05-07T19:45:24.6258006Z 2025-05-07T19:45:24.6258010Z 2025-05-07T19:45:24.6258013Z 2025-05-07T19:45:24.6258017Z 2025-05-07T19:45:24.6258020Z 2025-05-07T19:45:24.6258024Z 2025-05-07T19:45:24.6258027Z 2025-05-07T19:45:24.6258155Z  2025-05-07T19:45:24.6258314Z 2025-05-07T19:45:24.6258318Z 2025-05-07T19:45:24.6258322Z 2025-05-07T19:45:24.6258350Z 2025-05-07T19:45:24.6258354Z 2025-05-07T19:45:24.6258363Z 2025-05-07T19:45:24.6258373Z 2025-05-07T19:45:24.6258376Z 2025-05-07T19:45:24.6258511Z  2025-05-07T19:45:24.6258673Z 2025-05-07T19:45:24.6258677Z 2025-05-07T19:45:24.6258681Z 2025-05-07T19:45:24.6258684Z 2025-05-07T19:45:24.6258688Z 2025-05-07T19:45:24.6258691Z 2025-05-07T19:45:24.6258725Z 2025-05-07T19:45:24.6258729Z 2025-05-07T19:45:24.6258732Z 2025-05-07T19:45:24.6258865Z  2025-05-07T19:45:24.6259033Z 2025-05-07T19:45:24.6259037Z 2025-05-07T19:45:24.6259041Z 2025-05-07T19:45:24.6259044Z 2025-05-07T19:45:24.6259048Z 2025-05-07T19:45:24.6259051Z 2025-05-07T19:45:24.6259055Z 2025-05-07T19:45:24.6259059Z 2025-05-07T19:45:24.6259085Z 2025-05-07T19:45:24.6259089Z 2025-05-07T19:45:24.6259224Z  2025-05-07T19:45:24.6259403Z 2025-05-07T19:45:24.6259406Z 2025-05-07T19:45:24.6259410Z 2025-05-07T19:45:24.6259413Z 2025-05-07T19:45:24.6259417Z 2025-05-07T19:45:24.6259420Z 2025-05-07T19:45:24.6259424Z 2025-05-07T19:45:24.6259434Z 2025-05-07T19:45:24.6259471Z 2025-05-07T19:45:24.6259474Z 2025-05-07T19:45:24.6259478Z 2025-05-07T19:45:24.6259619Z  2025-05-07T19:45:24.6259816Z 2025-05-07T19:45:24.6259820Z 2025-05-07T19:45:24.6259823Z 2025-05-07T19:45:24.6259826Z 2025-05-07T19:45:24.6259830Z 2025-05-07T19:45:24.6259833Z 2025-05-07T19:45:24.6259837Z 2025-05-07T19:45:24.6259841Z 2025-05-07T19:45:24.6259871Z 2025-05-07T19:45:24.6259874Z 2025-05-07T19:45:24.6259878Z 2025-05-07T19:45:24.6259882Z 2025-05-07T19:45:24.6260029Z  2025-05-07T19:45:24.6260229Z 2025-05-07T19:45:24.6260232Z 2025-05-07T19:45:24.6260236Z 2025-05-07T19:45:24.6260239Z 2025-05-07T19:45:24.6260243Z 2025-05-07T19:45:24.6260247Z 2025-05-07T19:45:24.6260251Z 2025-05-07T19:45:24.6260281Z 2025-05-07T19:45:24.6260284Z 2025-05-07T19:45:24.6260288Z 2025-05-07T19:45:24.6260291Z 2025-05-07T19:45:24.6260295Z 2025-05-07T19:45:24.6260298Z 2025-05-07T19:45:24.6260539Z  2025-05-07T19:45:24.6260753Z 2025-05-07T19:45:24.6260757Z 2025-05-07T19:45:24.6260761Z 2025-05-07T19:45:24.6260765Z 2025-05-07T19:45:24.6260769Z 2025-05-07T19:45:24.6260798Z 2025-05-07T19:45:24.6260801Z 2025-05-07T19:45:24.6260805Z 2025-05-07T19:45:24.6260808Z 2025-05-07T19:45:24.6260812Z 2025-05-07T19:45:24.6260815Z 2025-05-07T19:45:24.6260818Z 2025-05-07T19:45:24.6260822Z 2025-05-07T19:45:24.6260825Z 2025-05-07T19:45:24.6260988Z  2025-05-07T19:45:24.6261203Z 2025-05-07T19:45:24.6261240Z 2025-05-07T19:45:24.6261244Z 2025-05-07T19:45:24.6261247Z 2025-05-07T19:45:24.6261250Z 2025-05-07T19:45:24.6261254Z 2025-05-07T19:45:24.6261257Z 2025-05-07T19:45:24.6261261Z 2025-05-07T19:45:24.6261264Z 2025-05-07T19:45:24.6261268Z 2025-05-07T19:45:24.6261271Z 2025-05-07T19:45:24.6261274Z 2025-05-07T19:45:24.6261278Z 2025-05-07T19:45:24.6261282Z 2025-05-07T19:45:24.6261285Z 2025-05-07T19:45:24.6261448Z  2025-05-07T19:45:24.6261766Z 2025-05-07T19:45:24.6261770Z 2025-05-07T19:45:24.6261773Z 2025-05-07T19:45:24.6261777Z 2025-05-07T19:45:24.6261781Z 2025-05-07T19:45:24.6261784Z 2025-05-07T19:45:24.6261788Z 2025-05-07T19:45:24.6261792Z 2025-05-07T19:45:24.6261795Z 2025-05-07T19:45:24.6261799Z 2025-05-07T19:45:24.6261802Z 2025-05-07T19:45:24.6261805Z 2025-05-07T19:45:24.6261808Z 2025-05-07T19:45:24.6261812Z 2025-05-07T19:45:24.6261815Z 2025-05-07T19:45:24.6261818Z 2025-05-07T19:45:24.6262006Z  2025-05-07T19:45:24.6262231Z 2025-05-07T19:45:24.6262234Z 2025-05-07T19:45:24.6262238Z 2025-05-07T19:45:24.6262241Z 2025-05-07T19:45:24.6262245Z 2025-05-07T19:45:24.6262248Z 2025-05-07T19:45:24.6262252Z 2025-05-07T19:45:24.6262255Z 2025-05-07T19:45:24.6262259Z 2025-05-07T19:45:24.6262262Z 2025-05-07T19:45:24.6262266Z 2025-05-07T19:45:24.6262269Z 2025-05-07T19:45:24.6262274Z 2025-05-07T19:45:24.6262278Z 2025-05-07T19:45:24.6262286Z 2025-05-07T19:45:24.6262320Z 2025-05-07T19:45:24.6262323Z 2025-05-07T19:45:24.6262482Z  2025-05-07T19:45:24.6262712Z 2025-05-07T19:45:24.6262715Z 2025-05-07T19:45:24.6262719Z 2025-05-07T19:45:24.6262722Z 2025-05-07T19:45:24.6262726Z 2025-05-07T19:45:24.6262729Z 2025-05-07T19:45:24.6262733Z 2025-05-07T19:45:24.6262736Z 2025-05-07T19:45:24.6262739Z 2025-05-07T19:45:24.6262743Z 2025-05-07T19:45:24.6262746Z 2025-05-07T19:45:24.6262750Z 2025-05-07T19:45:24.6262755Z 2025-05-07T19:45:24.6262782Z 2025-05-07T19:45:24.6262785Z 2025-05-07T19:45:24.6262790Z 2025-05-07T19:45:24.6262793Z 2025-05-07T19:45:24.6262796Z 2025-05-07T19:45:24.6262969Z  2025-05-07T19:45:24.6263228Z 2025-05-07T19:45:24.6263232Z 2025-05-07T19:45:24.6263346Z  2025-05-07T19:45:24.6263457Z 2025-05-07T19:45:24.6263462Z 2025-05-07T19:45:24.6263564Z  2025-05-07T19:45:24.6263687Z 2025-05-07T19:45:24.6263695Z 2025-05-07T19:45:24.6263702Z 2025-05-07T19:45:24.6263807Z  2025-05-07T19:45:24.6263921Z 2025-05-07T19:45:24.6263925Z 2025-05-07T19:45:24.6263928Z 2025-05-07T19:45:24.6263932Z 2025-05-07T19:45:24.6264058Z  2025-05-07T19:45:24.6264176Z 2025-05-07T19:45:24.6264180Z 2025-05-07T19:45:24.6264183Z 2025-05-07T19:45:24.6264187Z 2025-05-07T19:45:24.6264190Z 2025-05-07T19:45:24.6264294Z  2025-05-07T19:45:24.6264432Z 2025-05-07T19:45:24.6264436Z 2025-05-07T19:45:24.6264439Z 2025-05-07T19:45:24.6264443Z 2025-05-07T19:45:24.6264447Z 2025-05-07T19:45:24.6264450Z 2025-05-07T19:45:24.6264557Z  2025-05-07T19:45:24.6264685Z 2025-05-07T19:45:24.6264689Z 2025-05-07T19:45:24.6264692Z 2025-05-07T19:45:24.6264705Z 2025-05-07T19:45:24.6264708Z 2025-05-07T19:45:24.6264712Z 2025-05-07T19:45:24.6264716Z 2025-05-07T19:45:24.6264825Z  2025-05-07T19:45:24.6264964Z 2025-05-07T19:45:24.6264968Z 2025-05-07T19:45:24.6264971Z 2025-05-07T19:45:24.6265047Z 2025-05-07T19:45:24.6265055Z 2025-05-07T19:45:24.6265059Z 2025-05-07T19:45:24.6265063Z 2025-05-07T19:45:24.6265080Z 2025-05-07T19:45:24.6265197Z  2025-05-07T19:45:24.6265348Z 2025-05-07T19:45:24.6265351Z 2025-05-07T19:45:24.6265355Z 2025-05-07T19:45:24.6265358Z 2025-05-07T19:45:24.6265362Z 2025-05-07T19:45:24.6265365Z 2025-05-07T19:45:24.6265369Z 2025-05-07T19:45:24.6265372Z 2025-05-07T19:45:24.6265376Z 2025-05-07T19:45:24.6265500Z  2025-05-07T19:45:24.6265658Z 2025-05-07T19:45:24.6265662Z 2025-05-07T19:45:24.6265665Z 2025-05-07T19:45:24.6265668Z 2025-05-07T19:45:24.6265672Z 2025-05-07T19:45:24.6265675Z 2025-05-07T19:45:24.6265679Z 2025-05-07T19:45:24.6265682Z 2025-05-07T19:45:24.6265687Z 2025-05-07T19:45:24.6265690Z 2025-05-07T19:45:24.6265858Z  2025-05-07T19:45:24.6266031Z 2025-05-07T19:45:24.6266036Z 2025-05-07T19:45:24.6266040Z 2025-05-07T19:45:24.6266043Z 2025-05-07T19:45:24.6266051Z 2025-05-07T19:45:24.6266112Z 2025-05-07T19:45:24.6266116Z 2025-05-07T19:45:24.6266119Z 2025-05-07T19:45:24.6266123Z 2025-05-07T19:45:24.6266127Z 2025-05-07T19:45:24.6266130Z 2025-05-07T19:45:24.6266295Z  2025-05-07T19:45:24.6266482Z 2025-05-07T19:45:24.6266486Z 2025-05-07T19:45:24.6266490Z 2025-05-07T19:45:24.6266493Z 2025-05-07T19:45:24.6266497Z 2025-05-07T19:45:24.6266500Z 2025-05-07T19:45:24.6266505Z 2025-05-07T19:45:24.6266509Z 2025-05-07T19:45:24.6266512Z 2025-05-07T19:45:24.6266516Z 2025-05-07T19:45:24.6266520Z 2025-05-07T19:45:24.6266524Z 2025-05-07T19:45:24.6266684Z  2025-05-07T19:45:24.6266885Z 2025-05-07T19:45:24.6266889Z 2025-05-07T19:45:24.6266892Z 2025-05-07T19:45:24.6266896Z 2025-05-07T19:45:24.6266899Z 2025-05-07T19:45:24.6266902Z 2025-05-07T19:45:24.6266906Z 2025-05-07T19:45:24.6266910Z 2025-05-07T19:45:24.6266913Z 2025-05-07T19:45:24.6266917Z 2025-05-07T19:45:24.6266921Z 2025-05-07T19:45:24.6266953Z 2025-05-07T19:45:24.6266961Z 2025-05-07T19:45:24.6267109Z  2025-05-07T19:45:24.6267308Z 2025-05-07T19:45:24.6267312Z 2025-05-07T19:45:24.6267315Z 2025-05-07T19:45:24.6267319Z 2025-05-07T19:45:24.6267323Z 2025-05-07T19:45:24.6267327Z 2025-05-07T19:45:24.6267331Z 2025-05-07T19:45:24.6267335Z 2025-05-07T19:45:24.6267338Z 2025-05-07T19:45:24.6267365Z 2025-05-07T19:45:24.6267368Z 2025-05-07T19:45:24.6267371Z 2025-05-07T19:45:24.6267375Z 2025-05-07T19:45:24.6267378Z 2025-05-07T19:45:24.6267526Z  2025-05-07T19:45:24.6267734Z 2025-05-07T19:45:24.6267738Z 2025-05-07T19:45:24.6267742Z 2025-05-07T19:45:24.6267745Z 2025-05-07T19:45:24.6267749Z 2025-05-07T19:45:24.6267770Z 2025-05-07T19:45:24.6267773Z 2025-05-07T19:45:24.6267776Z 2025-05-07T19:45:24.6267780Z 2025-05-07T19:45:24.6267783Z 2025-05-07T19:45:24.6267787Z 2025-05-07T19:45:24.6267790Z 2025-05-07T19:45:24.6267794Z 2025-05-07T19:45:24.6267797Z 2025-05-07T19:45:24.6267804Z 2025-05-07T19:45:24.6268087Z  2025-05-07T19:45:24.6268296Z 2025-05-07T19:45:24.6268321Z 2025-05-07T19:45:24.6268324Z 2025-05-07T19:45:24.6268328Z 2025-05-07T19:45:24.6268333Z 2025-05-07T19:45:24.6268336Z 2025-05-07T19:45:24.6268339Z 2025-05-07T19:45:24.6268343Z 2025-05-07T19:45:24.6268346Z 2025-05-07T19:45:24.6268350Z 2025-05-07T19:45:24.6268353Z 2025-05-07T19:45:24.6268356Z 2025-05-07T19:45:24.6268360Z 2025-05-07T19:45:24.6268363Z 2025-05-07T19:45:24.6268366Z 2025-05-07T19:45:24.6268369Z 2025-05-07T19:45:24.6268529Z  2025-05-07T19:45:24.6268763Z 2025-05-07T19:45:24.6268768Z 2025-05-07T19:45:24.6268771Z 2025-05-07T19:45:24.6268774Z 2025-05-07T19:45:24.6268778Z 2025-05-07T19:45:24.6268781Z 2025-05-07T19:45:24.6268784Z 2025-05-07T19:45:24.6268787Z 2025-05-07T19:45:24.6268791Z 2025-05-07T19:45:24.6268794Z 2025-05-07T19:45:24.6268797Z 2025-05-07T19:45:24.6268800Z 2025-05-07T19:45:24.6268861Z 2025-05-07T19:45:24.6268869Z 2025-05-07T19:45:24.6268872Z 2025-05-07T19:45:24.6268876Z 2025-05-07T19:45:24.6268879Z 2025-05-07T19:45:24.6269052Z  2025-05-07T19:45:24.6269266Z 2025-05-07T19:45:24.6269270Z 2025-05-07T19:45:24.6269273Z 2025-05-07T19:45:24.6269277Z 2025-05-07T19:45:24.6269280Z 2025-05-07T19:45:24.6269284Z 2025-05-07T19:45:24.6269287Z 2025-05-07T19:45:24.6269290Z 2025-05-07T19:45:24.6269294Z 2025-05-07T19:45:24.6269297Z 2025-05-07T19:45:24.6269300Z 2025-05-07T19:45:24.6269303Z 2025-05-07T19:45:24.6269307Z 2025-05-07T19:45:24.6269323Z 2025-05-07T19:45:24.6269326Z 2025-05-07T19:45:24.6269330Z 2025-05-07T19:45:24.6269333Z 2025-05-07T19:45:24.6269336Z 2025-05-07T19:45:24.6269499Z  2025-05-07T19:45:24.6269720Z 2025-05-07T19:45:24.6269723Z 2025-05-07T19:45:24.6269841Z  2025-05-07T19:45:24.6269937Z 2025-05-07T19:45:24.6269940Z 2025-05-07T19:45:24.6270038Z  2025-05-07T19:45:24.6270207Z 2025-05-07T19:45:24.6270211Z 2025-05-07T19:45:24.6270215Z 2025-05-07T19:45:24.6270318Z  2025-05-07T19:45:24.6270434Z 2025-05-07T19:45:24.6270438Z 2025-05-07T19:45:24.6270441Z 2025-05-07T19:45:24.6270445Z 2025-05-07T19:45:24.6270568Z  2025-05-07T19:45:24.6270687Z 2025-05-07T19:45:24.6270690Z 2025-05-07T19:45:24.6270694Z 2025-05-07T19:45:24.6270698Z 2025-05-07T19:45:24.6270701Z 2025-05-07T19:45:24.6270813Z  2025-05-07T19:45:24.6270959Z 2025-05-07T19:45:24.6270962Z 2025-05-07T19:45:24.6270966Z 2025-05-07T19:45:24.6270969Z 2025-05-07T19:45:24.6270973Z 2025-05-07T19:45:24.6270976Z 2025-05-07T19:45:24.6271088Z  2025-05-07T19:45:24.6271219Z 2025-05-07T19:45:24.6271222Z 2025-05-07T19:45:24.6271226Z 2025-05-07T19:45:24.6271252Z 2025-05-07T19:45:24.6271256Z 2025-05-07T19:45:24.6271259Z 2025-05-07T19:45:24.6271262Z 2025-05-07T19:45:24.6271374Z  2025-05-07T19:45:24.6271518Z 2025-05-07T19:45:24.6271525Z 2025-05-07T19:45:24.6271532Z 2025-05-07T19:45:24.6271536Z 2025-05-07T19:45:24.6271539Z 2025-05-07T19:45:24.6271543Z 2025-05-07T19:45:24.6271546Z 2025-05-07T19:45:24.6271571Z 2025-05-07T19:45:24.6271692Z  2025-05-07T19:45:24.6271846Z 2025-05-07T19:45:24.6271849Z 2025-05-07T19:45:24.6271853Z 2025-05-07T19:45:24.6271856Z 2025-05-07T19:45:24.6271859Z 2025-05-07T19:45:24.6271863Z 2025-05-07T19:45:24.6271866Z 2025-05-07T19:45:24.6271869Z 2025-05-07T19:45:24.6271873Z 2025-05-07T19:45:24.6272018Z  2025-05-07T19:45:24.6272181Z 2025-05-07T19:45:24.6272185Z 2025-05-07T19:45:24.6272188Z 2025-05-07T19:45:24.6272191Z 2025-05-07T19:45:24.6272195Z 2025-05-07T19:45:24.6272198Z 2025-05-07T19:45:24.6272201Z 2025-05-07T19:45:24.6272204Z 2025-05-07T19:45:24.6272208Z 2025-05-07T19:45:24.6272211Z 2025-05-07T19:45:24.6272354Z  2025-05-07T19:45:24.6272522Z 2025-05-07T19:45:24.6272525Z 2025-05-07T19:45:24.6272533Z 2025-05-07T19:45:24.6272540Z 2025-05-07T19:45:24.6272543Z 2025-05-07T19:45:24.6272547Z 2025-05-07T19:45:24.6272550Z 2025-05-07T19:45:24.6272553Z 2025-05-07T19:45:24.6272557Z 2025-05-07T19:45:24.6272560Z 2025-05-07T19:45:24.6272563Z 2025-05-07T19:45:24.6272719Z  2025-05-07T19:45:24.6272905Z 2025-05-07T19:45:24.6272908Z 2025-05-07T19:45:24.6272912Z 2025-05-07T19:45:24.6272915Z 2025-05-07T19:45:24.6272919Z 2025-05-07T19:45:24.6272922Z 2025-05-07T19:45:24.6272925Z 2025-05-07T19:45:24.6272928Z 2025-05-07T19:45:24.6272932Z 2025-05-07T19:45:24.6272935Z 2025-05-07T19:45:24.6272938Z 2025-05-07T19:45:24.6272942Z 2025-05-07T19:45:24.6273104Z  2025-05-07T19:45:24.6273291Z 2025-05-07T19:45:24.6273294Z 2025-05-07T19:45:24.6273298Z 2025-05-07T19:45:24.6273301Z 2025-05-07T19:45:24.6273305Z 2025-05-07T19:45:24.6273308Z 2025-05-07T19:45:24.6273311Z 2025-05-07T19:45:24.6273315Z 2025-05-07T19:45:24.6273318Z 2025-05-07T19:45:24.6273394Z 2025-05-07T19:45:24.6273401Z 2025-05-07T19:45:24.6273422Z 2025-05-07T19:45:24.6273425Z 2025-05-07T19:45:24.6273570Z  2025-05-07T19:45:24.6273763Z 2025-05-07T19:45:24.6273766Z 2025-05-07T19:45:24.6273770Z 2025-05-07T19:45:24.6273773Z 2025-05-07T19:45:24.6273776Z 2025-05-07T19:45:24.6273781Z 2025-05-07T19:45:24.6273784Z 2025-05-07T19:45:24.6273788Z 2025-05-07T19:45:24.6273791Z 2025-05-07T19:45:24.6273818Z 2025-05-07T19:45:24.6273821Z 2025-05-07T19:45:24.6273824Z 2025-05-07T19:45:24.6273828Z 2025-05-07T19:45:24.6273833Z 2025-05-07T19:45:24.6273980Z  2025-05-07T19:45:24.6274173Z 2025-05-07T19:45:24.6274177Z 2025-05-07T19:45:24.6274180Z 2025-05-07T19:45:24.6274184Z 2025-05-07T19:45:24.6274187Z 2025-05-07T19:45:24.6274212Z 2025-05-07T19:45:24.6274215Z 2025-05-07T19:45:24.6274218Z 2025-05-07T19:45:24.6274222Z 2025-05-07T19:45:24.6274225Z 2025-05-07T19:45:24.6274228Z 2025-05-07T19:45:24.6274236Z 2025-05-07T19:45:24.6274295Z 2025-05-07T19:45:24.6274298Z 2025-05-07T19:45:24.6274344Z 2025-05-07T19:45:24.6274499Z  2025-05-07T19:45:24.6275392Z 2025-05-07T19:45:24.6275396Z 2025-05-07T19:45:24.6275399Z 2025-05-07T19:45:24.6275403Z 2025-05-07T19:45:24.6275407Z 2025-05-07T19:45:24.6275410Z 2025-05-07T19:45:24.6275414Z 2025-05-07T19:45:24.6275417Z 2025-05-07T19:45:24.6275421Z 2025-05-07T19:45:24.6275425Z 2025-05-07T19:45:24.6275428Z 2025-05-07T19:45:24.6275432Z 2025-05-07T19:45:24.6275435Z 2025-05-07T19:45:24.6275438Z 2025-05-07T19:45:24.6275441Z 2025-05-07T19:45:24.6275445Z 2025-05-07T19:45:24.6275684Z  2025-05-07T19:45:24.6275901Z 2025-05-07T19:45:24.6275905Z 2025-05-07T19:45:24.6275909Z 2025-05-07T19:45:24.6275912Z 2025-05-07T19:45:24.6275916Z 2025-05-07T19:45:24.6275919Z 2025-05-07T19:45:24.6275923Z 2025-05-07T19:45:24.6275927Z 2025-05-07T19:45:24.6275930Z 2025-05-07T19:45:24.6275938Z 2025-05-07T19:45:24.6275946Z 2025-05-07T19:45:24.6275949Z 2025-05-07T19:45:24.6275974Z 2025-05-07T19:45:24.6275978Z 2025-05-07T19:45:24.6275981Z 2025-05-07T19:45:24.6275984Z 2025-05-07T19:45:24.6275988Z 2025-05-07T19:45:24.6276153Z  2025-05-07T19:45:24.6276373Z 2025-05-07T19:45:24.6276378Z 2025-05-07T19:45:24.6276382Z 2025-05-07T19:45:24.6276386Z 2025-05-07T19:45:24.6276390Z 2025-05-07T19:45:24.6276393Z 2025-05-07T19:45:24.6276409Z 2025-05-07T19:45:24.6276413Z 2025-05-07T19:45:24.6276416Z 2025-05-07T19:45:24.6276420Z 2025-05-07T19:45:24.6276423Z 2025-05-07T19:45:24.6276426Z 2025-05-07T19:45:24.6276430Z 2025-05-07T19:45:24.6276433Z 2025-05-07T19:45:24.6276437Z 2025-05-07T19:45:24.6276440Z 2025-05-07T19:45:24.6276444Z 2025-05-07T19:45:24.6276447Z 2025-05-07T19:45:24.6276624Z  2025-05-07T19:45:24.6276878Z 2025-05-07T19:45:24.6276882Z 2025-05-07T19:45:24.6276988Z  2025-05-07T19:45:24.6277110Z 2025-05-07T19:45:24.6277117Z 2025-05-07T19:45:24.6277250Z  2025-05-07T19:45:24.6277369Z 2025-05-07T19:45:24.6277374Z 2025-05-07T19:45:24.6277378Z 2025-05-07T19:45:24.6277493Z  done 2025-05-07T19:45:24.9363070Z Preparing transaction: | / - done 2025-05-07T19:45:28.7838719Z Verifying transaction: | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - done 2025-05-07T19:45:31.5998465Z Executing transaction: | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ done 2025-05-07T19:45:32.0131502Z [INSTALL] Adding symlink librhash.so.0, which is needed by CMake ... 2025-05-07T19:45:33.9328215Z + ln -s /github/home/miniconda/envs/build_binary/lib/librhash.so /github/home/miniconda/envs/build_binary/lib/librhash.so.0 2025-05-07T19:45:33.9329149Z 2025-05-07T19:45:33.9337320Z 2025-05-07T19:45:33.9363270Z [EXEC] [ATTEMPT 0/3] + conda run -n build_binary pip install build 2025-05-07T19:45:36.2719586Z WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-05-07T19:45:36.2721053Z 2025-05-07T19:45:36.2721188Z Collecting build 2025-05-07T19:45:36.2721551Z Downloading build-1.2.2.post1-py3-none-any.whl.metadata (6.5 kB) 2025-05-07T19:45:36.2722433Z Requirement already satisfied: packaging>=19.1 in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from build) (25.0) 2025-05-07T19:45:36.2723164Z Collecting pyproject_hooks (from build) 2025-05-07T19:45:36.2723641Z Downloading pyproject_hooks-1.2.0-py3-none-any.whl.metadata (1.3 kB) 2025-05-07T19:45:36.2724183Z Downloading build-1.2.2.post1-py3-none-any.whl (22 kB) 2025-05-07T19:45:36.2724660Z Downloading pyproject_hooks-1.2.0-py3-none-any.whl (10 kB) 2025-05-07T19:45:36.2725551Z Installing collected packages: pyproject_hooks, build 2025-05-07T19:45:36.2725809Z 2025-05-07T19:45:36.2726002Z Successfully installed build-1.2.2.post1 pyproject_hooks-1.2.0 2025-05-07T19:45:36.2726320Z 2025-05-07T19:45:38.1371023Z /github/home/miniconda/envs/build_binary/bin/make 2025-05-07T19:45:38.1371338Z 2025-05-07T19:45:38.2171788Z [CHECK] Binary make found in PATH 2025-05-07T19:45:40.0442066Z /github/home/miniconda/envs/build_binary/bin/cmake 2025-05-07T19:45:40.0442420Z 2025-05-07T19:45:40.1199123Z [CHECK] Binary cmake found in PATH 2025-05-07T19:45:41.9560154Z /github/home/miniconda/envs/build_binary/bin/ninja 2025-05-07T19:45:41.9561338Z 2025-05-07T19:45:42.0155119Z [CHECK] Binary ninja found in PATH 2025-05-07T19:45:43.9254658Z [CHECK] Python (sub-)package 'click' found ... 2025-05-07T19:45:45.9768738Z [CHECK] Python (sub-)package 'hypothesis' found ... 2025-05-07T19:45:47.9332294Z [CHECK] Python (sub-)package 'jinja2' found ... 2025-05-07T19:45:49.9558157Z [CHECK] Python (sub-)package 'skbuild' found ... 2025-05-07T19:45:51.8384051Z [CHECK] Python (sub-)package 'wheel' found ... 2025-05-07T19:45:51.8385322Z [INSTALL] Successfully installed all the build tools 2025-05-07T19:45:51.8452975Z ##[group]Run . $PRELUDE; install_cuda $BUILD_ENV 12.8.0 2025-05-07T19:45:51.8453423Z . $PRELUDE; install_cuda $BUILD_ENV 12.8.0 2025-05-07T19:45:51.8453991Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:45:51.8454323Z env: 2025-05-07T19:45:51.8454547Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:45:51.8454860Z BUILD_ENV: build_binary 2025-05-07T19:45:51.8455193Z BUILD_TARGET: genai 2025-05-07T19:45:51.8455599Z BUILD_VARIANT: cuda 2025-05-07T19:45:51.8455857Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:45:51.8456181Z ##[endgroup] 2025-05-07T19:45:52.2858298Z ################################################################################ 2025-05-07T19:45:52.2858765Z # Install CUDA 2025-05-07T19:45:52.2859052Z # 2025-05-07T19:45:52.2876414Z # [2025-05-07T19:45:52.286Z] + install_cuda build_binary 12.8.0 2025-05-07T19:45:52.2877655Z ################################################################################ 2025-05-07T19:45:52.2878510Z 2025-05-07T19:45:52.2888597Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:45:52.3771426Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:45:52.3772795Z [SETUP] Cleaning up Conda packages ... 2025-05-07T19:45:52.3776340Z + conda clean --packages --tarball -y 2025-05-07T19:45:52.3776570Z 2025-05-07T19:45:52.8766874Z Will remove 144 (595.4 MB) tarball(s). 2025-05-07T19:45:52.8768018Z Will remove 19 (4.7 MB) package(s). 2025-05-07T19:45:52.9337530Z 2025-05-07T19:45:52.9343947Z + conda clean --all -y 2025-05-07T19:45:52.9344592Z 2025-05-07T19:45:53.5610006Z There are no unused tarball(s) to remove. 2025-05-07T19:45:53.5611101Z Will remove 1 index cache(s). 2025-05-07T19:45:53.5611995Z There are no unused package(s) to remove. 2025-05-07T19:45:53.5612964Z There are no tempfile(s) to remove. 2025-05-07T19:45:53.5613850Z There are no logfile(s) to remove. 2025-05-07T19:45:53.6185597Z 2025-05-07T19:45:53.6195187Z [INSTALL] Installing CUDA 12.8.0 ... 2025-05-07T19:45:53.6222908Z [EXEC] [ATTEMPT 0/3] + conda install --force-reinstall -n build_binary -c conda-forge --override-channels -y cuda=12.8.0 2025-05-07T19:45:54.4706792Z Channels: 2025-05-07T19:45:54.4707212Z - conda-forge 2025-05-07T19:45:54.4707563Z Platform: linux-64 2025-05-07T19:46:04.2152548Z Collecting package metadata (repodata.json): - \ | / - \ | / - \ | / - \ | / - done 2025-05-07T19:46:05.7834968Z Solving environment: | / - \ done 2025-05-07T19:46:05.9199788Z 2025-05-07T19:46:05.9200807Z ## Package Plan ## 2025-05-07T19:46:05.9201473Z 2025-05-07T19:46:05.9202084Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:46:05.9203059Z 2025-05-07T19:46:05.9203370Z added / updated specs: 2025-05-07T19:46:05.9204970Z - cuda=12.8.0 2025-05-07T19:46:05.9205362Z 2025-05-07T19:46:05.9205374Z 2025-05-07T19:46:05.9205732Z The following packages will be downloaded: 2025-05-07T19:46:05.9206387Z 2025-05-07T19:46:05.9206860Z package | build 2025-05-07T19:46:05.9207193Z ---------------------------|----------------- 2025-05-07T19:46:05.9207573Z attr-2.5.1 | h166bdaf_1 69 KB conda-forge 2025-05-07T19:46:05.9207991Z binutils-2.40 | h4852527_7 31 KB conda-forge 2025-05-07T19:46:05.9208458Z c-compiler-1.5.2 | h0b41bf4_0 6 KB conda-forge 2025-05-07T19:46:05.9208894Z cuda-12.8.0 | ha804496_0 26 KB conda-forge 2025-05-07T19:46:05.9209428Z cuda-cccl_linux-64-12.8.55 | ha770c72_1 1.0 MB conda-forge 2025-05-07T19:46:05.9209928Z cuda-command-line-tools-12.8.0| ha770c72_0 20 KB conda-forge 2025-05-07T19:46:05.9210468Z cuda-compiler-12.8.0 | hbad6d8a_0 20 KB conda-forge 2025-05-07T19:46:05.9210947Z cuda-crt-dev_linux-64-12.8.61| ha770c72_1 90 KB conda-forge 2025-05-07T19:46:05.9211669Z cuda-crt-tools-12.8.61 | ha770c72_1 27 KB conda-forge 2025-05-07T19:46:05.9212157Z cuda-cudart-12.8.57 | h5888daf_1 22 KB conda-forge 2025-05-07T19:46:05.9212623Z cuda-cudart-dev-12.8.57 | h5888daf_1 23 KB conda-forge 2025-05-07T19:46:05.9213146Z cuda-cudart-dev_linux-64-12.8.57| h3f2d84a_1 377 KB conda-forge 2025-05-07T19:46:05.9213667Z cuda-cudart-static-12.8.57 | h5888daf_1 22 KB conda-forge 2025-05-07T19:46:05.9214223Z cuda-cudart-static_linux-64-12.8.57| h3f2d84a_1 950 KB conda-forge 2025-05-07T19:46:05.9214774Z cuda-cudart_linux-64-12.8.57| h3f2d84a_1 188 KB conda-forge 2025-05-07T19:46:05.9215371Z cuda-cuobjdump-12.8.55 | hbd13f7d_0 227 KB conda-forge 2025-05-07T19:46:05.9216078Z cuda-cupti-12.8.57 | hbd13f7d_0 1.8 MB conda-forge 2025-05-07T19:46:05.9216671Z cuda-cupti-dev-12.8.57 | h5888daf_0 4.0 MB conda-forge 2025-05-07T19:46:05.9217206Z cuda-cuxxfilt-12.8.55 | hbd13f7d_0 211 KB conda-forge 2025-05-07T19:46:05.9217714Z cuda-driver-dev-12.8.57 | h5888daf_1 22 KB conda-forge 2025-05-07T19:46:05.9218276Z cuda-driver-dev_linux-64-12.8.90| h3f2d84a_1 36 KB conda-forge 2025-05-07T19:46:05.9218815Z cuda-gdb-12.8.55 | h50b4baa_0 353 KB conda-forge 2025-05-07T19:46:05.9219296Z cuda-libraries-12.8.0 | ha770c72_0 20 KB conda-forge 2025-05-07T19:46:05.9219840Z cuda-libraries-dev-12.8.0 | ha770c72_0 20 KB conda-forge 2025-05-07T19:46:05.9220348Z cuda-nsight-12.8.55 | h7938cbb_0 113.2 MB conda-forge 2025-05-07T19:46:05.9220849Z cuda-nvcc-12.8.61 | hcdd1206_0 23 KB conda-forge 2025-05-07T19:46:05.9221349Z cuda-nvcc-dev_linux-64-12.8.61| he91c749_1 12.7 MB conda-forge 2025-05-07T19:46:05.9222007Z cuda-nvcc-impl-12.8.61 | h85509e4_1 25 KB conda-forge 2025-05-07T19:46:05.9222500Z cuda-nvcc-tools-12.8.61 | he02047a_1 24.5 MB conda-forge 2025-05-07T19:46:05.9222970Z cuda-nvcc_linux-64-12.8.61 | h04802cd_0 25 KB conda-forge 2025-05-07T19:46:05.9223460Z cuda-nvdisasm-12.8.55 | hbd13f7d_0 4.9 MB conda-forge 2025-05-07T19:46:05.9223914Z cuda-nvml-dev-12.8.55 | hbd13f7d_0 134 KB conda-forge 2025-05-07T19:46:05.9224389Z cuda-nvprof-12.8.57 | hbd13f7d_0 2.5 MB conda-forge 2025-05-07T19:46:05.9224867Z cuda-nvprune-12.8.55 | hbd13f7d_0 68 KB conda-forge 2025-05-07T19:46:05.9225313Z cuda-nvrtc-12.8.61 | hbd13f7d_0 63.1 MB conda-forge 2025-05-07T19:46:05.9225892Z cuda-nvrtc-dev-12.8.61 | h5888daf_0 34 KB conda-forge 2025-05-07T19:46:05.9226345Z cuda-nvtx-12.8.55 | hbd13f7d_0 31 KB conda-forge 2025-05-07T19:46:05.9226843Z cuda-nvvm-dev_linux-64-12.8.61| ha770c72_1 25 KB conda-forge 2025-05-07T19:46:05.9227325Z cuda-nvvm-impl-12.8.61 | he02047a_1 20.8 MB conda-forge 2025-05-07T19:46:05.9227831Z cuda-nvvm-tools-12.8.61 | he02047a_1 23.5 MB conda-forge 2025-05-07T19:46:05.9228309Z cuda-nvvp-12.8.57 | hbd13f7d_0 112.4 MB conda-forge 2025-05-07T19:46:05.9228744Z cuda-opencl-12.8.55 | hbd13f7d_0 29 KB conda-forge 2025-05-07T19:46:05.9229236Z cuda-opencl-dev-12.8.55 | h5888daf_0 95 KB conda-forge 2025-05-07T19:46:05.9229715Z cuda-profiler-api-12.8.55 | h7938cbb_0 22 KB conda-forge 2025-05-07T19:46:05.9230222Z cuda-runtime-12.8.0 | ha804496_0 20 KB conda-forge 2025-05-07T19:46:05.9230728Z cuda-sanitizer-api-12.8.55 | hbd13f7d_0 8.8 MB conda-forge 2025-05-07T19:46:05.9231332Z cuda-toolkit-12.8.0 | ha804496_0 20 KB conda-forge 2025-05-07T19:46:05.9231803Z cuda-tools-12.8.0 | ha770c72_0 19 KB conda-forge 2025-05-07T19:46:05.9232239Z cuda-version-12.8 | h5d125a7_3 21 KB conda-forge 2025-05-07T19:46:05.9232730Z cuda-visual-tools-12.8.0 | ha770c72_0 20 KB conda-forge 2025-05-07T19:46:05.9233197Z cxx-compiler-1.5.2 | hf52228f_0 6 KB conda-forge 2025-05-07T19:46:05.9233637Z dbus-1.13.6 | h5008d03_3 604 KB conda-forge 2025-05-07T19:46:05.9234048Z gcc-11.4.0 | h602e360_13 49 KB conda-forge 2025-05-07T19:46:05.9234454Z gds-tools-1.13.0.11 | h5888daf_0 37.9 MB conda-forge 2025-05-07T19:46:05.9234887Z gmp-6.3.0 | hac33072_2 449 KB conda-forge 2025-05-07T19:46:05.9235266Z gxx-11.4.0 | h602e360_13 49 KB conda-forge 2025-05-07T19:46:05.9235686Z libcap-2.75 | h39aace5_0 118 KB conda-forge 2025-05-07T19:46:05.9236106Z libcublas-12.8.3.14 | h9ab20c4_0 460.2 MB conda-forge 2025-05-07T19:46:05.9236582Z libcublas-dev-12.8.3.14 | h9ab20c4_0 89 KB conda-forge 2025-05-07T19:46:05.9237057Z libcufft-11.3.3.41 | hbd13f7d_0 147.4 MB conda-forge 2025-05-07T19:46:05.9237503Z libcufft-dev-11.3.3.41 | h5888daf_0 33 KB conda-forge 2025-05-07T19:46:05.9237976Z libcufile-1.13.0.11 | h12f29b5_0 939 KB conda-forge 2025-05-07T19:46:05.9238425Z libcufile-dev-1.13.0.11 | h5888daf_0 35 KB conda-forge 2025-05-07T19:46:05.9238903Z libcurand-10.3.9.55 | hbd13f7d_0 43.6 MB conda-forge 2025-05-07T19:46:05.9239387Z libcurand-dev-10.3.9.55 | h5888daf_0 265 KB conda-forge 2025-05-07T19:46:05.9239858Z libcusolver-11.7.2.55 | h9ab20c4_0 156.9 MB conda-forge 2025-05-07T19:46:05.9240358Z libcusolver-dev-11.7.2.55 | h9ab20c4_0 59 KB conda-forge 2025-05-07T19:46:05.9240831Z libcusparse-12.5.7.53 | hbd13f7d_0 164.9 MB conda-forge 2025-05-07T19:46:05.9241328Z libcusparse-dev-12.5.7.53 | h5888daf_0 51 KB conda-forge 2025-05-07T19:46:05.9241802Z libgcrypt-lib-1.11.0 | hb9d3cd8_2 572 KB conda-forge 2025-05-07T19:46:05.9242272Z libglvnd-1.7.0 | ha4b6fd6_2 129 KB conda-forge 2025-05-07T19:46:05.9242748Z libgpg-error-1.55 | h3f2d84a_0 305 KB conda-forge 2025-05-07T19:46:05.9243175Z libnl-3.11.0 | hb9d3cd8_0 724 KB conda-forge 2025-05-07T19:46:05.9243692Z libnpp-12.3.3.65 | hbd13f7d_0 130.6 MB conda-forge 2025-05-07T19:46:05.9244127Z libnpp-dev-12.3.3.65 | h5888daf_0 443 KB conda-forge 2025-05-07T19:46:05.9244586Z libnuma-2.0.18 | h4ab18f5_2 42 KB conda-forge 2025-05-07T19:46:05.9245021Z libnvfatbin-12.8.55 | hbd13f7d_0 793 KB conda-forge 2025-05-07T19:46:05.9245528Z libnvfatbin-dev-12.8.55 | h5888daf_0 26 KB conda-forge 2025-05-07T19:46:05.9246032Z libnvjitlink-12.8.61 | hbd13f7d_0 28.7 MB conda-forge 2025-05-07T19:46:05.9246511Z libnvjitlink-dev-12.8.61 | h5888daf_0 25 KB conda-forge 2025-05-07T19:46:05.9247009Z libnvjpeg-12.3.5.57 | h97fd463_0 3.0 MB conda-forge 2025-05-07T19:46:05.9247468Z libnvjpeg-dev-12.3.5.57 | ha770c72_0 31 KB conda-forge 2025-05-07T19:46:05.9247949Z libopengl-1.7.0 | ha4b6fd6_2 50 KB conda-forge 2025-05-07T19:46:05.9248420Z libsystemd0-257.4 | h4e0b6ca_1 477 KB conda-forge 2025-05-07T19:46:05.9248943Z libudev1-257.4 | hbe16f8c_1 141 KB conda-forge 2025-05-07T19:46:05.9249409Z libxkbcommon-1.7.0 | h2c5496b_1 579 KB conda-forge 2025-05-07T19:46:05.9249853Z libxkbfile-1.1.0 | h166bdaf_1 111 KB conda-forge 2025-05-07T19:46:05.9250295Z lz4-c-1.10.0 | h5888daf_1 163 KB conda-forge 2025-05-07T19:46:05.9250740Z nsight-compute-2025.1.0.14 | hb5ebaad_0 320.6 MB conda-forge 2025-05-07T19:46:05.9251207Z nspr-4.36 | h5888daf_0 225 KB conda-forge 2025-05-07T19:46:05.9251622Z nss-3.111 | h159eef7_0 1.9 MB conda-forge 2025-05-07T19:46:05.9252018Z ocl-icd-2.3.3 | hb9d3cd8_0 104 KB conda-forge 2025-05-07T19:46:05.9252500Z opencl-headers-2024.10.24 | h5888daf_0 53 KB conda-forge 2025-05-07T19:46:05.9252961Z rdma-core-57.0 | h5888daf_0 1.2 MB conda-forge 2025-05-07T19:46:05.9253407Z wayland-1.23.1 | h3e06ad9_0 314 KB conda-forge 2025-05-07T19:46:05.9253815Z xcb-util-0.4.1 | hb711507_2 19 KB conda-forge 2025-05-07T19:46:05.9254272Z xcb-util-cursor-0.1.5 | hb9d3cd8_0 20 KB conda-forge 2025-05-07T19:46:05.9254756Z xcb-util-image-0.4.0 | hb711507_2 24 KB conda-forge 2025-05-07T19:46:05.9255317Z xcb-util-keysyms-0.4.1 | hb711507_0 14 KB conda-forge 2025-05-07T19:46:05.9256029Z xcb-util-renderutil-0.3.10 | hb711507_0 17 KB conda-forge 2025-05-07T19:46:05.9256584Z xcb-util-wm-0.4.2 | hb711507_0 50 KB conda-forge 2025-05-07T19:46:05.9257099Z xkeyboard-config-2.44 | hb9d3cd8_0 384 KB conda-forge 2025-05-07T19:46:05.9257663Z xorg-libxcomposite-0.4.6 | hb9d3cd8_2 13 KB conda-forge 2025-05-07T19:46:05.9258185Z xorg-libxdamage-1.1.6 | hb9d3cd8_0 13 KB conda-forge 2025-05-07T19:46:05.9258665Z ------------------------------------------------------------ 2025-05-07T19:46:05.9259039Z Total: 1.86 GB 2025-05-07T19:46:05.9259297Z 2025-05-07T19:46:05.9259440Z The following NEW packages will be INSTALLED: 2025-05-07T19:46:05.9259685Z 2025-05-07T19:46:05.9259879Z attr conda-forge/linux-64::attr-2.5.1-h166bdaf_1 2025-05-07T19:46:05.9260361Z binutils conda-forge/linux-64::binutils-2.40-h4852527_7 2025-05-07T19:46:05.9260882Z c-compiler conda-forge/linux-64::c-compiler-1.5.2-h0b41bf4_0 2025-05-07T19:46:05.9261351Z cuda conda-forge/noarch::cuda-12.8.0-ha804496_0 2025-05-07T19:46:05.9261895Z cuda-cccl_linux-64 conda-forge/noarch::cuda-cccl_linux-64-12.8.55-ha770c72_1 2025-05-07T19:46:05.9262635Z cuda-command-line~ conda-forge/linux-64::cuda-command-line-tools-12.8.0-ha770c72_0 2025-05-07T19:46:05.9263302Z cuda-compiler conda-forge/noarch::cuda-compiler-12.8.0-hbad6d8a_0 2025-05-07T19:46:05.9263927Z cuda-crt-dev_linu~ conda-forge/noarch::cuda-crt-dev_linux-64-12.8.61-ha770c72_1 2025-05-07T19:46:05.9264541Z cuda-crt-tools conda-forge/linux-64::cuda-crt-tools-12.8.61-ha770c72_1 2025-05-07T19:46:05.9265127Z cuda-cudart conda-forge/linux-64::cuda-cudart-12.8.57-h5888daf_1 2025-05-07T19:46:05.9265694Z cuda-cudart-dev conda-forge/linux-64::cuda-cudart-dev-12.8.57-h5888daf_1 2025-05-07T19:46:05.9266350Z cuda-cudart-dev_l~ conda-forge/noarch::cuda-cudart-dev_linux-64-12.8.57-h3f2d84a_1 2025-05-07T19:46:05.9267046Z cuda-cudart-static conda-forge/linux-64::cuda-cudart-static-12.8.57-h5888daf_1 2025-05-07T19:46:05.9267718Z cuda-cudart-stati~ conda-forge/noarch::cuda-cudart-static_linux-64-12.8.57-h3f2d84a_1 2025-05-07T19:46:05.9268490Z cuda-cudart_linux~ conda-forge/noarch::cuda-cudart_linux-64-12.8.57-h3f2d84a_1 2025-05-07T19:46:05.9269061Z cuda-cuobjdump conda-forge/linux-64::cuda-cuobjdump-12.8.55-hbd13f7d_0 2025-05-07T19:46:05.9269712Z cuda-cupti conda-forge/linux-64::cuda-cupti-12.8.57-hbd13f7d_0 2025-05-07T19:46:05.9270259Z cuda-cupti-dev conda-forge/linux-64::cuda-cupti-dev-12.8.57-h5888daf_0 2025-05-07T19:46:05.9270804Z cuda-cuxxfilt conda-forge/linux-64::cuda-cuxxfilt-12.8.55-hbd13f7d_0 2025-05-07T19:46:05.9271381Z cuda-driver-dev conda-forge/linux-64::cuda-driver-dev-12.8.57-h5888daf_1 2025-05-07T19:46:05.9271999Z cuda-driver-dev_l~ conda-forge/noarch::cuda-driver-dev_linux-64-12.8.90-h3f2d84a_1 2025-05-07T19:46:05.9272541Z cuda-gdb conda-forge/linux-64::cuda-gdb-12.8.55-h50b4baa_0 2025-05-07T19:46:05.9273072Z cuda-libraries conda-forge/linux-64::cuda-libraries-12.8.0-ha770c72_0 2025-05-07T19:46:05.9273644Z cuda-libraries-dev conda-forge/linux-64::cuda-libraries-dev-12.8.0-ha770c72_0 2025-05-07T19:46:05.9274228Z cuda-nsight conda-forge/linux-64::cuda-nsight-12.8.55-h7938cbb_0 2025-05-07T19:46:05.9275098Z cuda-nvcc conda-forge/linux-64::cuda-nvcc-12.8.61-hcdd1206_0 2025-05-07T19:46:05.9275909Z cuda-nvcc-dev_lin~ conda-forge/noarch::cuda-nvcc-dev_linux-64-12.8.61-he91c749_1 2025-05-07T19:46:05.9276552Z cuda-nvcc-impl conda-forge/linux-64::cuda-nvcc-impl-12.8.61-h85509e4_1 2025-05-07T19:46:05.9277141Z cuda-nvcc-tools conda-forge/linux-64::cuda-nvcc-tools-12.8.61-he02047a_1 2025-05-07T19:46:05.9277783Z cuda-nvcc_linux-64 conda-forge/linux-64::cuda-nvcc_linux-64-12.8.61-h04802cd_0 2025-05-07T19:46:05.9278402Z cuda-nvdisasm conda-forge/linux-64::cuda-nvdisasm-12.8.55-hbd13f7d_0 2025-05-07T19:46:05.9278965Z cuda-nvml-dev conda-forge/linux-64::cuda-nvml-dev-12.8.55-hbd13f7d_0 2025-05-07T19:46:05.9279554Z cuda-nvprof conda-forge/linux-64::cuda-nvprof-12.8.57-hbd13f7d_0 2025-05-07T19:46:05.9280109Z cuda-nvprune conda-forge/linux-64::cuda-nvprune-12.8.55-hbd13f7d_0 2025-05-07T19:46:05.9280671Z cuda-nvrtc conda-forge/linux-64::cuda-nvrtc-12.8.61-hbd13f7d_0 2025-05-07T19:46:05.9281254Z cuda-nvrtc-dev conda-forge/linux-64::cuda-nvrtc-dev-12.8.61-h5888daf_0 2025-05-07T19:46:05.9281797Z cuda-nvtx conda-forge/linux-64::cuda-nvtx-12.8.55-hbd13f7d_0 2025-05-07T19:46:05.9282394Z cuda-nvvm-dev_lin~ conda-forge/noarch::cuda-nvvm-dev_linux-64-12.8.61-ha770c72_1 2025-05-07T19:46:05.9283008Z cuda-nvvm-impl conda-forge/linux-64::cuda-nvvm-impl-12.8.61-he02047a_1 2025-05-07T19:46:05.9283631Z cuda-nvvm-tools conda-forge/linux-64::cuda-nvvm-tools-12.8.61-he02047a_1 2025-05-07T19:46:05.9284210Z cuda-nvvp conda-forge/linux-64::cuda-nvvp-12.8.57-hbd13f7d_0 2025-05-07T19:46:05.9284737Z cuda-opencl conda-forge/linux-64::cuda-opencl-12.8.55-hbd13f7d_0 2025-05-07T19:46:05.9285339Z cuda-opencl-dev conda-forge/linux-64::cuda-opencl-dev-12.8.55-h5888daf_0 2025-05-07T19:46:05.9286108Z cuda-profiler-api conda-forge/linux-64::cuda-profiler-api-12.8.55-h7938cbb_0 2025-05-07T19:46:05.9286746Z cuda-runtime conda-forge/noarch::cuda-runtime-12.8.0-ha804496_0 2025-05-07T19:46:05.9287500Z cuda-sanitizer-api conda-forge/linux-64::cuda-sanitizer-api-12.8.55-hbd13f7d_0 2025-05-07T19:46:05.9288061Z cuda-toolkit conda-forge/noarch::cuda-toolkit-12.8.0-ha804496_0 2025-05-07T19:46:05.9288582Z cuda-tools conda-forge/linux-64::cuda-tools-12.8.0-ha770c72_0 2025-05-07T19:46:05.9289072Z cuda-version conda-forge/noarch::cuda-version-12.8-h5d125a7_3 2025-05-07T19:46:05.9289644Z cuda-visual-tools conda-forge/linux-64::cuda-visual-tools-12.8.0-ha770c72_0 2025-05-07T19:46:05.9290229Z cxx-compiler conda-forge/linux-64::cxx-compiler-1.5.2-hf52228f_0 2025-05-07T19:46:05.9290696Z dbus conda-forge/linux-64::dbus-1.13.6-h5008d03_3 2025-05-07T19:46:05.9291126Z gcc conda-forge/linux-64::gcc-11.4.0-h602e360_13 2025-05-07T19:46:05.9291743Z gds-tools conda-forge/linux-64::gds-tools-1.13.0.11-h5888daf_0 2025-05-07T19:46:05.9293422Z gmp conda-forge/linux-64::gmp-6.3.0-hac33072_2 2025-05-07T19:46:05.9293890Z gxx conda-forge/linux-64::gxx-11.4.0-h602e360_13 2025-05-07T19:46:05.9294326Z libcap conda-forge/linux-64::libcap-2.75-h39aace5_0 2025-05-07T19:46:05.9294832Z libcublas conda-forge/linux-64::libcublas-12.8.3.14-h9ab20c4_0 2025-05-07T19:46:05.9295653Z libcublas-dev conda-forge/linux-64::libcublas-dev-12.8.3.14-h9ab20c4_0 2025-05-07T19:46:05.9296230Z libcufft conda-forge/linux-64::libcufft-11.3.3.41-hbd13f7d_0 2025-05-07T19:46:05.9296788Z libcufft-dev conda-forge/linux-64::libcufft-dev-11.3.3.41-h5888daf_0 2025-05-07T19:46:05.9297331Z libcufile conda-forge/linux-64::libcufile-1.13.0.11-h12f29b5_0 2025-05-07T19:46:05.9297905Z libcufile-dev conda-forge/linux-64::libcufile-dev-1.13.0.11-h5888daf_0 2025-05-07T19:46:05.9298460Z libcurand conda-forge/linux-64::libcurand-10.3.9.55-hbd13f7d_0 2025-05-07T19:46:05.9299035Z libcurand-dev conda-forge/linux-64::libcurand-dev-10.3.9.55-h5888daf_0 2025-05-07T19:46:05.9299622Z libcusolver conda-forge/linux-64::libcusolver-11.7.2.55-h9ab20c4_0 2025-05-07T19:46:05.9300210Z libcusolver-dev conda-forge/linux-64::libcusolver-dev-11.7.2.55-h9ab20c4_0 2025-05-07T19:46:05.9300830Z libcusparse conda-forge/linux-64::libcusparse-12.5.7.53-hbd13f7d_0 2025-05-07T19:46:05.9301420Z libcusparse-dev conda-forge/linux-64::libcusparse-dev-12.5.7.53-h5888daf_0 2025-05-07T19:46:05.9302038Z libgcrypt-lib conda-forge/linux-64::libgcrypt-lib-1.11.0-hb9d3cd8_2 2025-05-07T19:46:05.9302596Z libglvnd conda-forge/linux-64::libglvnd-1.7.0-ha4b6fd6_2 2025-05-07T19:46:05.9303117Z libgpg-error conda-forge/linux-64::libgpg-error-1.55-h3f2d84a_0 2025-05-07T19:46:05.9303642Z libnl conda-forge/linux-64::libnl-3.11.0-hb9d3cd8_0 2025-05-07T19:46:05.9304103Z libnpp conda-forge/linux-64::libnpp-12.3.3.65-hbd13f7d_0 2025-05-07T19:46:05.9304639Z libnpp-dev conda-forge/linux-64::libnpp-dev-12.3.3.65-h5888daf_0 2025-05-07T19:46:05.9305167Z libnuma conda-forge/linux-64::libnuma-2.0.18-h4ab18f5_2 2025-05-07T19:46:05.9305669Z libnvfatbin conda-forge/linux-64::libnvfatbin-12.8.55-hbd13f7d_0 2025-05-07T19:46:05.9306270Z libnvfatbin-dev conda-forge/linux-64::libnvfatbin-dev-12.8.55-h5888daf_0 2025-05-07T19:46:05.9306855Z libnvjitlink conda-forge/linux-64::libnvjitlink-12.8.61-hbd13f7d_0 2025-05-07T19:46:05.9307471Z libnvjitlink-dev conda-forge/linux-64::libnvjitlink-dev-12.8.61-h5888daf_0 2025-05-07T19:46:05.9308180Z libnvjpeg conda-forge/linux-64::libnvjpeg-12.3.5.57-h97fd463_0 2025-05-07T19:46:05.9308723Z libnvjpeg-dev conda-forge/linux-64::libnvjpeg-dev-12.3.5.57-ha770c72_0 2025-05-07T19:46:05.9309383Z libopengl conda-forge/linux-64::libopengl-1.7.0-ha4b6fd6_2 2025-05-07T19:46:05.9309893Z libsystemd0 conda-forge/linux-64::libsystemd0-257.4-h4e0b6ca_1 2025-05-07T19:46:05.9310426Z libudev1 conda-forge/linux-64::libudev1-257.4-hbe16f8c_1 2025-05-07T19:46:05.9310966Z libxkbcommon conda-forge/linux-64::libxkbcommon-1.7.0-h2c5496b_1 2025-05-07T19:46:05.9311485Z libxkbfile conda-forge/linux-64::libxkbfile-1.1.0-h166bdaf_1 2025-05-07T19:46:05.9311972Z lz4-c conda-forge/linux-64::lz4-c-1.10.0-h5888daf_1 2025-05-07T19:46:05.9312485Z nsight-compute conda-forge/linux-64::nsight-compute-2025.1.0.14-hb5ebaad_0 2025-05-07T19:46:05.9313024Z nspr conda-forge/linux-64::nspr-4.36-h5888daf_0 2025-05-07T19:46:05.9313458Z nss conda-forge/linux-64::nss-3.111-h159eef7_0 2025-05-07T19:46:05.9313880Z ocl-icd conda-forge/linux-64::ocl-icd-2.3.3-hb9d3cd8_0 2025-05-07T19:46:05.9314425Z opencl-headers conda-forge/linux-64::opencl-headers-2024.10.24-h5888daf_0 2025-05-07T19:46:05.9314967Z rdma-core conda-forge/linux-64::rdma-core-57.0-h5888daf_0 2025-05-07T19:46:05.9315535Z wayland conda-forge/linux-64::wayland-1.23.1-h3e06ad9_0 2025-05-07T19:46:05.9316023Z xcb-util conda-forge/linux-64::xcb-util-0.4.1-hb711507_2 2025-05-07T19:46:05.9316533Z xcb-util-cursor conda-forge/linux-64::xcb-util-cursor-0.1.5-hb9d3cd8_0 2025-05-07T19:46:05.9317112Z xcb-util-image conda-forge/linux-64::xcb-util-image-0.4.0-hb711507_2 2025-05-07T19:46:05.9317682Z xcb-util-keysyms conda-forge/linux-64::xcb-util-keysyms-0.4.1-hb711507_0 2025-05-07T19:46:05.9318323Z xcb-util-renderut~ conda-forge/linux-64::xcb-util-renderutil-0.3.10-hb711507_0 2025-05-07T19:46:05.9318989Z xcb-util-wm conda-forge/linux-64::xcb-util-wm-0.4.2-hb711507_0 2025-05-07T19:46:05.9319536Z xkeyboard-config conda-forge/linux-64::xkeyboard-config-2.44-hb9d3cd8_0 2025-05-07T19:46:05.9320143Z xorg-libxcomposite conda-forge/linux-64::xorg-libxcomposite-0.4.6-hb9d3cd8_2 2025-05-07T19:46:05.9320724Z xorg-libxdamage conda-forge/linux-64::xorg-libxdamage-1.1.6-hb9d3cd8_0 2025-05-07T19:46:05.9321074Z 2025-05-07T19:46:05.9321110Z 2025-05-07T19:46:05.9321114Z 2025-05-07T19:46:05.9321264Z Downloading and Extracting Packages: ...working... 2025-05-07T19:46:05.9321672Z libcublas-12.8.3.14 | 460.2 MB | | 0% 2025-05-07T19:46:05.9321910Z 2025-05-07T19:46:05.9322377Z nsight-compute-2025. | 320.6 MB | | 0%  2025-05-07T19:46:05.9322657Z 2025-05-07T19:46:05.9322660Z 2025-05-07T19:46:05.9322883Z libcusparse-12.5.7.5 | 164.9 MB | | 0%  2025-05-07T19:46:05.9323142Z 2025-05-07T19:46:05.9323146Z 2025-05-07T19:46:05.9323149Z 2025-05-07T19:46:05.9329005Z libcusolver-11.7.2.5 | 156.9 MB | | 0%  2025-05-07T19:46:05.9329644Z 2025-05-07T19:46:05.9329662Z 2025-05-07T19:46:05.9329670Z 2025-05-07T19:46:05.9329725Z 2025-05-07T19:46:05.9341915Z libcufft-11.3.3.41 | 147.4 MB | | 0%  2025-05-07T19:46:05.9342240Z 2025-05-07T19:46:05.9342262Z 2025-05-07T19:46:05.9342268Z 2025-05-07T19:46:05.9342277Z 2025-05-07T19:46:05.9342299Z 2025-05-07T19:46:05.9342590Z libnpp-12.3.3.65 | 130.6 MB | | 0%  2025-05-07T19:46:05.9342869Z 2025-05-07T19:46:05.9342873Z 2025-05-07T19:46:05.9342877Z 2025-05-07T19:46:05.9342881Z 2025-05-07T19:46:05.9342885Z 2025-05-07T19:46:05.9342906Z 2025-05-07T19:46:05.9344652Z cuda-nsight-12.8.55 | 113.2 MB | | 0%  2025-05-07T19:46:05.9345723Z 2025-05-07T19:46:05.9345767Z 2025-05-07T19:46:05.9345778Z 2025-05-07T19:46:05.9345788Z 2025-05-07T19:46:05.9345798Z 2025-05-07T19:46:05.9345810Z 2025-05-07T19:46:05.9345821Z 2025-05-07T19:46:05.9346579Z cuda-nvvp-12.8.57 | 112.4 MB | | 0%  2025-05-07T19:46:05.9347449Z 2025-05-07T19:46:05.9347462Z 2025-05-07T19:46:05.9347695Z 2025-05-07T19:46:05.9347706Z 2025-05-07T19:46:05.9347717Z 2025-05-07T19:46:05.9347728Z 2025-05-07T19:46:05.9347738Z 2025-05-07T19:46:05.9347748Z 2025-05-07T19:46:05.9348515Z cuda-nvrtc-12.8.61 | 63.1 MB | | 0%  2025-05-07T19:46:05.9349360Z 2025-05-07T19:46:05.9349371Z 2025-05-07T19:46:05.9349411Z 2025-05-07T19:46:05.9349421Z 2025-05-07T19:46:05.9349432Z 2025-05-07T19:46:05.9349441Z 2025-05-07T19:46:05.9349451Z 2025-05-07T19:46:05.9349461Z 2025-05-07T19:46:05.9349471Z 2025-05-07T19:46:05.9350201Z libcurand-10.3.9.55 | 43.6 MB | | 0%  2025-05-07T19:46:05.9350535Z 2025-05-07T19:46:05.9350538Z 2025-05-07T19:46:05.9350542Z 2025-05-07T19:46:05.9350546Z 2025-05-07T19:46:05.9350549Z 2025-05-07T19:46:05.9350552Z 2025-05-07T19:46:05.9350556Z 2025-05-07T19:46:05.9350559Z 2025-05-07T19:46:05.9350562Z 2025-05-07T19:46:05.9350565Z 2025-05-07T19:46:05.9350836Z gds-tools-1.13.0.11 | 37.9 MB | | 0%  2025-05-07T19:46:05.9351273Z 2025-05-07T19:46:05.9351277Z 2025-05-07T19:46:05.9351281Z 2025-05-07T19:46:05.9351284Z 2025-05-07T19:46:05.9351288Z 2025-05-07T19:46:05.9351291Z 2025-05-07T19:46:05.9351295Z 2025-05-07T19:46:05.9351386Z 2025-05-07T19:46:05.9351390Z 2025-05-07T19:46:05.9351393Z 2025-05-07T19:46:05.9351397Z 2025-05-07T19:46:05.9351698Z libnvjitlink-12.8.61 | 28.7 MB | | 0%  2025-05-07T19:46:05.9352047Z 2025-05-07T19:46:05.9352050Z 2025-05-07T19:46:05.9352055Z 2025-05-07T19:46:05.9352058Z 2025-05-07T19:46:05.9352062Z 2025-05-07T19:46:05.9352065Z 2025-05-07T19:46:05.9352069Z 2025-05-07T19:46:05.9352072Z 2025-05-07T19:46:05.9352075Z 2025-05-07T19:46:05.9352078Z 2025-05-07T19:46:05.9352081Z 2025-05-07T19:46:05.9352084Z 2025-05-07T19:46:05.9352409Z cuda-nvcc-tools-12.8 | 24.5 MB | | 0%  2025-05-07T19:46:05.9352763Z 2025-05-07T19:46:05.9352767Z 2025-05-07T19:46:05.9352775Z 2025-05-07T19:46:05.9352778Z 2025-05-07T19:46:05.9352782Z 2025-05-07T19:46:05.9352785Z 2025-05-07T19:46:05.9352788Z 2025-05-07T19:46:05.9352792Z 2025-05-07T19:46:05.9352796Z 2025-05-07T19:46:05.9352800Z 2025-05-07T19:46:05.9352807Z 2025-05-07T19:46:05.9352811Z 2025-05-07T19:46:05.9352814Z 2025-05-07T19:46:05.9353134Z cuda-nvvm-tools-12.8 | 23.5 MB | | 0%  2025-05-07T19:46:05.9353462Z 2025-05-07T19:46:05.9353466Z 2025-05-07T19:46:05.9353470Z 2025-05-07T19:46:05.9353473Z 2025-05-07T19:46:05.9353477Z 2025-05-07T19:46:05.9353480Z 2025-05-07T19:46:05.9353484Z 2025-05-07T19:46:05.9353487Z 2025-05-07T19:46:05.9353490Z 2025-05-07T19:46:05.9353494Z 2025-05-07T19:46:05.9353497Z 2025-05-07T19:46:05.9353500Z 2025-05-07T19:46:05.9353503Z 2025-05-07T19:46:05.9353506Z 2025-05-07T19:46:05.9360511Z cuda-nvvm-impl-12.8. | 20.8 MB | | 0%  2025-05-07T19:46:05.9361101Z 2025-05-07T19:46:05.9361113Z 2025-05-07T19:46:05.9361141Z 2025-05-07T19:46:05.9361146Z 2025-05-07T19:46:05.9361150Z 2025-05-07T19:46:05.9361157Z 2025-05-07T19:46:05.9361190Z 2025-05-07T19:46:05.9361195Z 2025-05-07T19:46:05.9361201Z 2025-05-07T19:46:05.9361212Z 2025-05-07T19:46:05.9361216Z 2025-05-07T19:46:05.9361221Z 2025-05-07T19:46:05.9361226Z 2025-05-07T19:46:05.9361231Z 2025-05-07T19:46:05.9361237Z 2025-05-07T19:46:05.9361852Z cuda-nvcc-dev_linux- | 12.7 MB | | 0%  2025-05-07T19:46:05.9362342Z 2025-05-07T19:46:05.9362377Z 2025-05-07T19:46:05.9362386Z 2025-05-07T19:46:05.9362391Z 2025-05-07T19:46:05.9362399Z 2025-05-07T19:46:05.9362408Z 2025-05-07T19:46:05.9362413Z 2025-05-07T19:46:05.9362420Z 2025-05-07T19:46:05.9362428Z 2025-05-07T19:46:05.9362434Z 2025-05-07T19:46:05.9362441Z 2025-05-07T19:46:05.9362449Z 2025-05-07T19:46:05.9362457Z 2025-05-07T19:46:05.9362462Z 2025-05-07T19:46:05.9362469Z 2025-05-07T19:46:05.9362494Z 2025-05-07T19:46:05.9363250Z cuda-sanitizer-api-1 | 8.8 MB | | 0%  2025-05-07T19:46:05.9363844Z 2025-05-07T19:46:05.9363850Z 2025-05-07T19:46:05.9363857Z 2025-05-07T19:46:05.9363870Z 2025-05-07T19:46:05.9363877Z 2025-05-07T19:46:05.9363885Z 2025-05-07T19:46:05.9363891Z 2025-05-07T19:46:05.9363898Z 2025-05-07T19:46:05.9363906Z 2025-05-07T19:46:05.9363911Z 2025-05-07T19:46:05.9363919Z 2025-05-07T19:46:05.9363927Z 2025-05-07T19:46:05.9363935Z 2025-05-07T19:46:05.9363940Z 2025-05-07T19:46:05.9363948Z 2025-05-07T19:46:05.9363957Z 2025-05-07T19:46:05.9363997Z 2025-05-07T19:46:05.9364532Z cuda-nvdisasm-12.8.5 | 4.9 MB | | 0%  2025-05-07T19:46:05.9364907Z 2025-05-07T19:46:05.9364912Z 2025-05-07T19:46:05.9364921Z 2025-05-07T19:46:05.9364926Z 2025-05-07T19:46:05.9364931Z 2025-05-07T19:46:05.9364936Z 2025-05-07T19:46:05.9364970Z 2025-05-07T19:46:05.9364975Z 2025-05-07T19:46:05.9364987Z 2025-05-07T19:46:05.9364995Z 2025-05-07T19:46:05.9364999Z 2025-05-07T19:46:05.9365004Z 2025-05-07T19:46:05.9365008Z 2025-05-07T19:46:05.9365013Z 2025-05-07T19:46:05.9365017Z 2025-05-07T19:46:05.9365022Z 2025-05-07T19:46:05.9365132Z 2025-05-07T19:46:05.9365137Z 2025-05-07T19:46:05.9365793Z cuda-cupti-dev-12.8. | 4.0 MB | | 0%  2025-05-07T19:46:05.9366245Z 2025-05-07T19:46:05.9366248Z 2025-05-07T19:46:05.9366252Z 2025-05-07T19:46:05.9366255Z 2025-05-07T19:46:05.9366259Z 2025-05-07T19:46:05.9366262Z 2025-05-07T19:46:05.9366265Z 2025-05-07T19:46:05.9366269Z 2025-05-07T19:46:05.9366272Z 2025-05-07T19:46:05.9366276Z 2025-05-07T19:46:05.9366279Z 2025-05-07T19:46:05.9366282Z 2025-05-07T19:46:05.9366285Z 2025-05-07T19:46:05.9366289Z 2025-05-07T19:46:05.9366292Z 2025-05-07T19:46:05.9366295Z 2025-05-07T19:46:05.9366298Z 2025-05-07T19:46:05.9366301Z 2025-05-07T19:46:05.9366339Z 2025-05-07T19:46:06.0298975Z ... (more hidden) ... 2025-05-07T19:46:06.0299846Z libcublas-12.8.3.14 | 460.2 MB | | 1% 2025-05-07T19:46:06.0300144Z 2025-05-07T19:46:06.0302821Z nsight-compute-2025. | 320.6 MB | 1 | 1%  2025-05-07T19:46:06.0303363Z 2025-05-07T19:46:06.0303423Z 2025-05-07T19:46:06.0309249Z libcusparse-12.5.7.5 | 164.9 MB | 3 | 4%  2025-05-07T19:46:06.0310628Z 2025-05-07T19:46:06.0310640Z 2025-05-07T19:46:06.0310650Z 2025-05-07T19:46:06.0677371Z libcusolver-11.7.2.5 | 156.9 MB | | 1%  2025-05-07T19:46:06.0677728Z 2025-05-07T19:46:06.0677735Z 2025-05-07T19:46:06.0677742Z 2025-05-07T19:46:06.0677747Z 2025-05-07T19:46:06.1299316Z libcufft-11.3.3.41 | 147.4 MB | | 0%  2025-05-07T19:46:06.1299976Z libcublas-12.8.3.14 | 460.2 MB | 2 | 3% 2025-05-07T19:46:06.1300240Z 2025-05-07T19:46:06.1309279Z nsight-compute-2025. | 320.6 MB | 3 | 3%  2025-05-07T19:46:06.1309838Z 2025-05-07T19:46:06.1309844Z 2025-05-07T19:46:06.1309849Z 2025-05-07T19:46:06.1346564Z libcusolver-11.7.2.5 | 156.9 MB | 8 | 8%  2025-05-07T19:46:06.1347738Z 2025-05-07T19:46:06.1347754Z 2025-05-07T19:46:06.1678530Z libcusparse-12.5.7.5 | 164.9 MB | 9 | 10%  2025-05-07T19:46:06.1679343Z 2025-05-07T19:46:06.1679374Z 2025-05-07T19:46:06.1679378Z 2025-05-07T19:46:06.1679382Z 2025-05-07T19:46:06.2300039Z libcufft-11.3.3.41 | 147.4 MB | 2 | 3%  2025-05-07T19:46:06.2314926Z libcublas-12.8.3.14 | 460.2 MB | 4 | 5% 2025-05-07T19:46:06.2316327Z 2025-05-07T19:46:06.2316343Z 2025-05-07T19:46:06.2316355Z 2025-05-07T19:46:06.2324805Z libcusolver-11.7.2.5 | 156.9 MB | #4 | 15%  2025-05-07T19:46:06.2326041Z 2025-05-07T19:46:06.2619158Z nsight-compute-2025. | 320.6 MB | 5 | 5%  2025-05-07T19:46:06.2620286Z 2025-05-07T19:46:06.2620301Z 2025-05-07T19:46:06.2678097Z libcusparse-12.5.7.5 | 164.9 MB | #4 | 15%  2025-05-07T19:46:06.2679778Z 2025-05-07T19:46:06.2679794Z 2025-05-07T19:46:06.2679798Z 2025-05-07T19:46:06.2679803Z 2025-05-07T19:46:06.3328554Z libcufft-11.3.3.41 | 147.4 MB | 8 | 9%  2025-05-07T19:46:06.3328889Z 2025-05-07T19:46:06.3342737Z nsight-compute-2025. | 320.6 MB | 7 | 8%  2025-05-07T19:46:06.3378817Z libcublas-12.8.3.14 | 460.2 MB | 6 | 6% 2025-05-07T19:46:06.3379617Z 2025-05-07T19:46:06.3379632Z 2025-05-07T19:46:06.3379690Z 2025-05-07T19:46:06.3618595Z libcusolver-11.7.2.5 | 156.9 MB | ## | 20%  2025-05-07T19:46:06.3619317Z 2025-05-07T19:46:06.3619323Z 2025-05-07T19:46:06.3677976Z libcusparse-12.5.7.5 | 164.9 MB | #9 | 20%  2025-05-07T19:46:06.3678858Z 2025-05-07T19:46:06.3678873Z 2025-05-07T19:46:06.3678886Z 2025-05-07T19:46:06.3678897Z 2025-05-07T19:46:06.4331128Z libcufft-11.3.3.41 | 147.4 MB | #4 | 15%  2025-05-07T19:46:06.4331522Z 2025-05-07T19:46:06.4343226Z nsight-compute-2025. | 320.6 MB | #1 | 11%  2025-05-07T19:46:06.4681484Z libcublas-12.8.3.14 | 460.2 MB | 8 | 8% 2025-05-07T19:46:06.4681779Z 2025-05-07T19:46:06.4682012Z 2025-05-07T19:46:06.4682030Z 2025-05-07T19:46:06.4682034Z 2025-05-07T19:46:06.4695343Z libcufft-11.3.3.41 | 147.4 MB | #9 | 20%  2025-05-07T19:46:06.4696239Z 2025-05-07T19:46:06.4696251Z 2025-05-07T19:46:06.4696262Z 2025-05-07T19:46:06.5304329Z libcusolver-11.7.2.5 | 156.9 MB | ##5 | 26%  2025-05-07T19:46:06.5305240Z 2025-05-07T19:46:06.5305254Z 2025-05-07T19:46:06.5345638Z libcusparse-12.5.7.5 | 164.9 MB | ##4 | 25%  2025-05-07T19:46:06.5379697Z libcublas-12.8.3.14 | 460.2 MB | # | 10% 2025-05-07T19:46:06.5379989Z 2025-05-07T19:46:06.5683829Z nsight-compute-2025. | 320.6 MB | #3 | 14%  2025-05-07T19:46:06.5684169Z 2025-05-07T19:46:06.5684176Z 2025-05-07T19:46:06.5684182Z 2025-05-07T19:46:06.5684231Z 2025-05-07T19:46:06.5695906Z libcufft-11.3.3.41 | 147.4 MB | ##5 | 25%  2025-05-07T19:46:06.5696818Z 2025-05-07T19:46:06.5696834Z 2025-05-07T19:46:06.5696845Z 2025-05-07T19:46:06.6363918Z libcusolver-11.7.2.5 | 156.9 MB | ### | 31%  2025-05-07T19:46:06.6381090Z libcublas-12.8.3.14 | 460.2 MB | #1 | 12% 2025-05-07T19:46:06.6381619Z 2025-05-07T19:46:06.6409162Z nsight-compute-2025. | 320.6 MB | #6 | 16%  2025-05-07T19:46:06.6409491Z 2025-05-07T19:46:06.6409498Z 2025-05-07T19:46:06.6684895Z libcusparse-12.5.7.5 | 164.9 MB | ##8 | 29%  2025-05-07T19:46:06.6685756Z 2025-05-07T19:46:06.6685772Z 2025-05-07T19:46:06.6685785Z 2025-05-07T19:46:06.6685814Z 2025-05-07T19:46:06.6817973Z libcufft-11.3.3.41 | 147.4 MB | ### | 30%  2025-05-07T19:46:06.6818833Z 2025-05-07T19:46:06.7412205Z 2025-05-07T19:46:06.7412213Z 2025-05-07T19:46:06.7412920Z libcusolver-11.7.2.5 | 156.9 MB | ###5 | 36%  2025-05-07T19:46:06.7413398Z 2025-05-07T19:46:06.7413404Z 2025-05-07T19:46:06.7423715Z libcusparse-12.5.7.5 | 164.9 MB | ###2 | 33%  2025-05-07T19:46:06.7424033Z 2025-05-07T19:46:06.7434477Z nsight-compute-2025. | 320.6 MB | #8 | 19%  2025-05-07T19:46:06.7684849Z libcublas-12.8.3.14 | 460.2 MB | #3 | 14% 2025-05-07T19:46:06.7685644Z 2025-05-07T19:46:06.7685680Z 2025-05-07T19:46:06.7685691Z 2025-05-07T19:46:06.7685703Z 2025-05-07T19:46:06.7818639Z libcufft-11.3.3.41 | 147.4 MB | ###5 | 35%  2025-05-07T19:46:06.7819505Z 2025-05-07T19:46:06.7819533Z 2025-05-07T19:46:06.7819545Z 2025-05-07T19:46:06.8427069Z libcusolver-11.7.2.5 | 156.9 MB | ####1 | 41%  2025-05-07T19:46:06.8427607Z 2025-05-07T19:46:06.8436244Z nsight-compute-2025. | 320.6 MB | ##1 | 21%  2025-05-07T19:46:06.8684067Z libcublas-12.8.3.14 | 460.2 MB | #5 | 16% 2025-05-07T19:46:06.8684353Z 2025-05-07T19:46:06.8684575Z 2025-05-07T19:46:06.8684579Z 2025-05-07T19:46:06.8684584Z 2025-05-07T19:46:06.8819005Z libcufft-11.3.3.41 | 147.4 MB | ####1 | 41%  2025-05-07T19:46:06.8819344Z 2025-05-07T19:46:06.8819350Z 2025-05-07T19:46:06.8819391Z 2025-05-07T19:46:06.9072212Z libcusolver-11.7.2.5 | 156.9 MB | ####7 | 47%  2025-05-07T19:46:06.9072532Z 2025-05-07T19:46:06.9072562Z 2025-05-07T19:46:06.9427325Z libcusparse-12.5.7.5 | 164.9 MB | ###6 | 37%  2025-05-07T19:46:06.9427653Z 2025-05-07T19:46:06.9638180Z nsight-compute-2025. | 320.6 MB | ##4 | 24%  2025-05-07T19:46:06.9819740Z libcublas-12.8.3.14 | 460.2 MB | #7 | 17% 2025-05-07T19:46:06.9820021Z 2025-05-07T19:46:06.9820027Z 2025-05-07T19:46:06.9820053Z 2025-05-07T19:46:06.9857973Z libcusolver-11.7.2.5 | 156.9 MB | #####2 | 53%  2025-05-07T19:46:06.9858846Z 2025-05-07T19:46:06.9858880Z 2025-05-07T19:46:06.9858891Z 2025-05-07T19:46:06.9858902Z 2025-05-07T19:46:07.0462046Z libcufft-11.3.3.41 | 147.4 MB | ####6 | 46%  2025-05-07T19:46:07.0462371Z 2025-05-07T19:46:07.0552960Z nsight-compute-2025. | 320.6 MB | ##6 | 27%  2025-05-07T19:46:07.0553836Z 2025-05-07T19:46:07.0553850Z 2025-05-07T19:46:07.0639306Z libcusparse-12.5.7.5 | 164.9 MB | #### | 41%  2025-05-07T19:46:07.0857679Z libcublas-12.8.3.14 | 460.2 MB | #9 | 19% 2025-05-07T19:46:07.0857975Z 2025-05-07T19:46:07.0857979Z 2025-05-07T19:46:07.0857983Z 2025-05-07T19:46:07.0857987Z 2025-05-07T19:46:07.0894273Z libcufft-11.3.3.41 | 147.4 MB | #####1 | 52%  2025-05-07T19:46:07.0895352Z 2025-05-07T19:46:07.0895391Z 2025-05-07T19:46:07.0895404Z 2025-05-07T19:46:07.1514535Z libcusolver-11.7.2.5 | 156.9 MB | #####8 | 58%  2025-05-07T19:46:07.1515442Z 2025-05-07T19:46:07.1554124Z nsight-compute-2025. | 320.6 MB | ##9 | 29%  2025-05-07T19:46:07.1554975Z 2025-05-07T19:46:07.1554990Z 2025-05-07T19:46:07.1726793Z libcusparse-12.5.7.5 | 164.9 MB | ####4 | 45%  2025-05-07T19:46:07.1857340Z libcublas-12.8.3.14 | 460.2 MB | ## | 21% 2025-05-07T19:46:07.1857670Z 2025-05-07T19:46:07.1857676Z 2025-05-07T19:46:07.1857680Z 2025-05-07T19:46:07.1857712Z 2025-05-07T19:46:07.1923674Z libcufft-11.3.3.41 | 147.4 MB | #####6 | 57%  2025-05-07T19:46:07.1923995Z 2025-05-07T19:46:07.1924000Z 2025-05-07T19:46:07.1924004Z 2025-05-07T19:46:07.2556834Z libcusolver-11.7.2.5 | 156.9 MB | ######3 | 64%  2025-05-07T19:46:07.2557732Z 2025-05-07T19:46:07.2557750Z 2025-05-07T19:46:07.2586811Z libcusparse-12.5.7.5 | 164.9 MB | ####8 | 49%  2025-05-07T19:46:07.2587711Z 2025-05-07T19:46:07.2777017Z nsight-compute-2025. | 320.6 MB | ###1 | 32%  2025-05-07T19:46:07.2933578Z libcublas-12.8.3.14 | 460.2 MB | ##2 | 22% 2025-05-07T19:46:07.2934140Z 2025-05-07T19:46:07.2934157Z 2025-05-07T19:46:07.2934164Z 2025-05-07T19:46:07.2934171Z 2025-05-07T19:46:07.3006166Z libcufft-11.3.3.41 | 147.4 MB | ######2 | 62%  2025-05-07T19:46:07.3006527Z 2025-05-07T19:46:07.3006533Z 2025-05-07T19:46:07.3006538Z 2025-05-07T19:46:07.3567275Z libcusolver-11.7.2.5 | 156.9 MB | ######8 | 69%  2025-05-07T19:46:07.3567629Z 2025-05-07T19:46:07.3567635Z 2025-05-07T19:46:07.3605696Z libcusparse-12.5.7.5 | 164.9 MB | #####2 | 53%  2025-05-07T19:46:07.3606003Z 2025-05-07T19:46:07.3934465Z nsight-compute-2025. | 320.6 MB | ###4 | 34%  2025-05-07T19:46:07.3934767Z 2025-05-07T19:46:07.3934957Z 2025-05-07T19:46:07.3934968Z 2025-05-07T19:46:07.3934976Z 2025-05-07T19:46:07.3973023Z libcufft-11.3.3.41 | 147.4 MB | ######8 | 68%  2025-05-07T19:46:07.4006845Z libcublas-12.8.3.14 | 460.2 MB | ##4 | 24% 2025-05-07T19:46:07.4007159Z 2025-05-07T19:46:07.4007166Z 2025-05-07T19:46:07.4007172Z 2025-05-07T19:46:07.4783140Z libcusolver-11.7.2.5 | 156.9 MB | #######4 | 74%  2025-05-07T19:46:07.4783754Z 2025-05-07T19:46:07.5082038Z nsight-compute-2025. | 320.6 MB | ###6 | 37%  2025-05-07T19:46:07.5082375Z 2025-05-07T19:46:07.5082381Z 2025-05-07T19:46:07.5186319Z libcusparse-12.5.7.5 | 164.9 MB | #####6 | 56%  2025-05-07T19:46:07.5187227Z 2025-05-07T19:46:07.5187260Z 2025-05-07T19:46:07.5187272Z 2025-05-07T19:46:07.5187283Z 2025-05-07T19:46:07.5337941Z libcufft-11.3.3.41 | 147.4 MB | #######3 | 73%  2025-05-07T19:46:07.5338260Z 2025-05-07T19:46:07.5338265Z 2025-05-07T19:46:07.5338270Z 2025-05-07T19:46:07.5403912Z libcusolver-11.7.2.5 | 156.9 MB | #######9 | 80%  2025-05-07T19:46:07.6110731Z libcublas-12.8.3.14 | 460.2 MB | ##5 | 26% 2025-05-07T19:46:07.6111110Z 2025-05-07T19:46:07.6111155Z 2025-05-07T19:46:07.6143305Z libcusparse-12.5.7.5 | 164.9 MB | #####9 | 60%  2025-05-07T19:46:07.6143961Z 2025-05-07T19:46:07.6540818Z nsight-compute-2025. | 320.6 MB | ###9 | 39%  2025-05-07T19:46:07.6541139Z 2025-05-07T19:46:07.6541173Z 2025-05-07T19:46:07.6541177Z 2025-05-07T19:46:07.6541181Z 2025-05-07T19:46:07.6573627Z libcufft-11.3.3.41 | 147.4 MB | #######8 | 78%  2025-05-07T19:46:07.6573966Z 2025-05-07T19:46:07.6573972Z 2025-05-07T19:46:07.6574378Z 2025-05-07T19:46:07.6768287Z libcusolver-11.7.2.5 | 156.9 MB | ########4 | 84%  2025-05-07T19:46:07.7114291Z libcublas-12.8.3.14 | 460.2 MB | ##6 | 27% 2025-05-07T19:46:07.7115092Z 2025-05-07T19:46:07.7115108Z 2025-05-07T19:46:07.7260162Z libcusparse-12.5.7.5 | 164.9 MB | ######3 | 64%  2025-05-07T19:46:07.7261470Z 2025-05-07T19:46:07.7689935Z nsight-compute-2025. | 320.6 MB | ####1 | 42%  2025-05-07T19:46:07.7690779Z 2025-05-07T19:46:07.7690838Z 2025-05-07T19:46:07.7690851Z 2025-05-07T19:46:07.7865615Z libcusolver-11.7.2.5 | 156.9 MB | ########9 | 89%  2025-05-07T19:46:07.7866507Z 2025-05-07T19:46:07.7866521Z 2025-05-07T19:46:07.7866533Z 2025-05-07T19:46:07.7866559Z 2025-05-07T19:46:07.8102493Z libcufft-11.3.3.41 | 147.4 MB | ########2 | 83%  2025-05-07T19:46:07.8111099Z libcublas-12.8.3.14 | 460.2 MB | ##8 | 28% 2025-05-07T19:46:07.8111375Z 2025-05-07T19:46:07.8111379Z 2025-05-07T19:46:07.8392243Z libcusparse-12.5.7.5 | 164.9 MB | ######7 | 67%  2025-05-07T19:46:07.8392592Z 2025-05-07T19:46:07.8902507Z nsight-compute-2025. | 320.6 MB | ####3 | 44%  2025-05-07T19:46:07.8902825Z 2025-05-07T19:46:07.8902830Z 2025-05-07T19:46:07.8902836Z 2025-05-07T19:46:07.8902842Z 2025-05-07T19:46:07.8941531Z libcufft-11.3.3.41 | 147.4 MB | ########7 | 87%  2025-05-07T19:46:07.8941851Z 2025-05-07T19:46:07.8941857Z 2025-05-07T19:46:07.8941863Z 2025-05-07T19:46:07.9116560Z libcusolver-11.7.2.5 | 156.9 MB | #########3 | 94%  2025-05-07T19:46:07.9116882Z 2025-05-07T19:46:07.9116889Z 2025-05-07T19:46:07.9239223Z libcusparse-12.5.7.5 | 164.9 MB | #######1 | 71%  2025-05-07T19:46:07.9418787Z libcublas-12.8.3.14 | 460.2 MB | ##9 | 30% 2025-05-07T19:46:07.9419638Z 2025-05-07T19:46:07.9943355Z nsight-compute-2025. | 320.6 MB | ####5 | 46%  2025-05-07T19:46:07.9944200Z 2025-05-07T19:46:07.9944214Z 2025-05-07T19:46:07.9944226Z 2025-05-07T19:46:07.9966214Z libcusolver-11.7.2.5 | 156.9 MB | #########7 | 98%  2025-05-07T19:46:07.9966567Z 2025-05-07T19:46:07.9966572Z 2025-05-07T19:46:07.9966577Z 2025-05-07T19:46:07.9966581Z 2025-05-07T19:46:08.0119022Z libcufft-11.3.3.41 | 147.4 MB | #########1 | 91%  2025-05-07T19:46:08.0119903Z 2025-05-07T19:46:08.0119963Z 2025-05-07T19:46:08.0412160Z libcusparse-12.5.7.5 | 164.9 MB | #######4 | 75%  2025-05-07T19:46:08.0591688Z libcublas-12.8.3.14 | 460.2 MB | ### | 31% 2025-05-07T19:46:08.0592075Z 2025-05-07T19:46:08.0969262Z nsight-compute-2025. | 320.6 MB | ####7 | 48%  2025-05-07T19:46:08.0969952Z 2025-05-07T19:46:08.0969968Z 2025-05-07T19:46:08.0969973Z 2025-05-07T19:46:08.0969976Z 2025-05-07T19:46:08.1121233Z libcufft-11.3.3.41 | 147.4 MB | #########5 | 96%  2025-05-07T19:46:08.1121557Z 2025-05-07T19:46:08.1121563Z 2025-05-07T19:46:08.1413468Z libcusparse-12.5.7.5 | 164.9 MB | #######8 | 79%  2025-05-07T19:46:08.1886785Z libcublas-12.8.3.14 | 460.2 MB | ###1 | 32% 2025-05-07T19:46:08.1887092Z 2025-05-07T19:46:08.2122861Z nsight-compute-2025. | 320.6 MB | ####9 | 50%  2025-05-07T19:46:08.2123231Z 2025-05-07T19:46:08.2123239Z 2025-05-07T19:46:08.2415290Z libcusparse-12.5.7.5 | 164.9 MB | ########4 | 84%  2025-05-07T19:46:08.2927377Z libcublas-12.8.3.14 | 460.2 MB | ###3 | 34% 2025-05-07T19:46:08.2927701Z 2025-05-07T19:46:08.3164404Z nsight-compute-2025. | 320.6 MB | #####1 | 52%  2025-05-07T19:46:08.3164733Z 2025-05-07T19:46:08.3164739Z 2025-05-07T19:46:08.3820737Z libcusparse-12.5.7.5 | 164.9 MB | ########8 | 89%  2025-05-07T19:46:08.3927043Z libcublas-12.8.3.14 | 460.2 MB | ###5 | 35% 2025-05-07T19:46:08.3927381Z 2025-05-07T19:46:08.4492440Z nsight-compute-2025. | 320.6 MB | #####3 | 54%  2025-05-07T19:46:08.4493310Z 2025-05-07T19:46:08.4493327Z 2025-05-07T19:46:08.4820847Z libcusparse-12.5.7.5 | 164.9 MB | #########3 | 93%  2025-05-07T19:46:08.4938118Z libcublas-12.8.3.14 | 460.2 MB | ###6 | 37% 2025-05-07T19:46:08.4938415Z 2025-05-07T19:46:08.5505831Z nsight-compute-2025. | 320.6 MB | #####5 | 56%  2025-05-07T19:46:08.5506139Z 2025-05-07T19:46:08.5506160Z 2025-05-07T19:46:08.5977385Z libcusparse-12.5.7.5 | 164.9 MB | #########9 | 100%  2025-05-07T19:46:08.5977690Z 2025-05-07T19:46:08.6525392Z nsight-compute-2025. | 320.6 MB | #####7 | 58%  2025-05-07T19:46:08.7583524Z libcublas-12.8.3.14 | 460.2 MB | ###7 | 38% 2025-05-07T19:46:08.7583851Z 2025-05-07T19:46:08.7583858Z 2025-05-07T19:46:08.7583863Z 2025-05-07T19:46:08.7583866Z 2025-05-07T19:46:08.7607479Z libcufft-11.3.3.41 | 147.4 MB | ########## | 100%  2025-05-07T19:46:08.7607850Z 2025-05-07T19:46:08.7607855Z 2025-05-07T19:46:08.7607858Z 2025-05-07T19:46:08.7692047Z libcusolver-11.7.2.5 | 156.9 MB | ########## | 100%  2025-05-07T19:46:08.8042674Z libcublas-12.8.3.14 | 460.2 MB | ###9 | 39% 2025-05-07T19:46:08.8043120Z 2025-05-07T19:46:08.8043129Z 2025-05-07T19:46:08.8043134Z 2025-05-07T19:46:08.8043139Z 2025-05-07T19:46:08.8043143Z 2025-05-07T19:46:08.8043148Z 2025-05-07T19:46:08.8087238Z cuda-nsight-12.8.55 | 113.2 MB | | 0%  2025-05-07T19:46:08.8087590Z 2025-05-07T19:46:08.8087594Z 2025-05-07T19:46:08.8087598Z 2025-05-07T19:46:08.8087603Z 2025-05-07T19:46:08.8088846Z 2025-05-07T19:46:08.8091206Z libnpp-12.3.3.65 | 130.6 MB | | 0%  2025-05-07T19:46:08.8091508Z 2025-05-07T19:46:08.8696911Z nsight-compute-2025. | 320.6 MB | #####9 | 60%  2025-05-07T19:46:08.9043975Z libcublas-12.8.3.14 | 460.2 MB | #### | 40% 2025-05-07T19:46:08.9044352Z 2025-05-07T19:46:08.9044415Z 2025-05-07T19:46:08.9044428Z 2025-05-07T19:46:08.9044431Z 2025-05-07T19:46:08.9044544Z 2025-05-07T19:46:08.9044548Z 2025-05-07T19:46:08.9090768Z cuda-nsight-12.8.55 | 113.2 MB | 6 | 6%  2025-05-07T19:46:08.9091195Z 2025-05-07T19:46:08.9091202Z 2025-05-07T19:46:08.9091207Z 2025-05-07T19:46:08.9091211Z 2025-05-07T19:46:08.9091240Z 2025-05-07T19:46:08.9521503Z libnpp-12.3.3.65 | 130.6 MB | 4 | 4%  2025-05-07T19:46:08.9522392Z 2025-05-07T19:46:08.9698439Z nsight-compute-2025. | 320.6 MB | ######1 | 61%  2025-05-07T19:46:09.0049282Z libcublas-12.8.3.14 | 460.2 MB | ####1 | 42% 2025-05-07T19:46:09.0049584Z 2025-05-07T19:46:09.0049806Z 2025-05-07T19:46:09.0049826Z 2025-05-07T19:46:09.0049833Z 2025-05-07T19:46:09.0049839Z 2025-05-07T19:46:09.0049845Z 2025-05-07T19:46:09.0091032Z cuda-nsight-12.8.55 | 113.2 MB | #1 | 12%  2025-05-07T19:46:09.0091384Z 2025-05-07T19:46:09.0091668Z 2025-05-07T19:46:09.0091673Z 2025-05-07T19:46:09.0091678Z 2025-05-07T19:46:09.0091683Z 2025-05-07T19:46:09.0697908Z libnpp-12.3.3.65 | 130.6 MB | 9 | 9%  2025-05-07T19:46:09.1051266Z libcublas-12.8.3.14 | 460.2 MB | ####2 | 43% 2025-05-07T19:46:09.1051599Z 2025-05-07T19:46:09.1051605Z 2025-05-07T19:46:09.1051609Z 2025-05-07T19:46:09.1051613Z 2025-05-07T19:46:09.1051617Z 2025-05-07T19:46:09.1051621Z 2025-05-07T19:46:09.1091775Z cuda-nsight-12.8.55 | 113.2 MB | #7 | 18%  2025-05-07T19:46:09.1092110Z 2025-05-07T19:46:09.1092115Z 2025-05-07T19:46:09.1092120Z 2025-05-07T19:46:09.1092125Z 2025-05-07T19:46:09.1092130Z 2025-05-07T19:46:09.1180839Z libnpp-12.3.3.65 | 130.6 MB | #4 | 14%  2025-05-07T19:46:09.1181152Z 2025-05-07T19:46:09.1697982Z nsight-compute-2025. | 320.6 MB | ######2 | 63%  2025-05-07T19:46:09.2051241Z libcublas-12.8.3.14 | 460.2 MB | ####4 | 44% 2025-05-07T19:46:09.2051581Z 2025-05-07T19:46:09.2051776Z 2025-05-07T19:46:09.2051785Z 2025-05-07T19:46:09.2051791Z 2025-05-07T19:46:09.2051796Z 2025-05-07T19:46:09.2051805Z 2025-05-07T19:46:09.2092970Z cuda-nsight-12.8.55 | 113.2 MB | ##4 | 25%  2025-05-07T19:46:09.2093395Z 2025-05-07T19:46:09.2093401Z 2025-05-07T19:46:09.2093406Z 2025-05-07T19:46:09.2093409Z 2025-05-07T19:46:09.2093413Z 2025-05-07T19:46:09.2371751Z libnpp-12.3.3.65 | 130.6 MB | #9 | 19%  2025-05-07T19:46:09.2372106Z 2025-05-07T19:46:09.2962090Z nsight-compute-2025. | 320.6 MB | ######4 | 64%  2025-05-07T19:46:09.3059923Z libcublas-12.8.3.14 | 460.2 MB | ####5 | 45% 2025-05-07T19:46:09.3060213Z 2025-05-07T19:46:09.3060243Z 2025-05-07T19:46:09.3060377Z 2025-05-07T19:46:09.3060387Z 2025-05-07T19:46:09.3060393Z 2025-05-07T19:46:09.3060399Z 2025-05-07T19:46:09.3220907Z cuda-nsight-12.8.55 | 113.2 MB | ### | 30%  2025-05-07T19:46:09.3221264Z 2025-05-07T19:46:09.3221305Z 2025-05-07T19:46:09.3221310Z 2025-05-07T19:46:09.3221315Z 2025-05-07T19:46:09.3221320Z 2025-05-07T19:46:09.3417217Z libnpp-12.3.3.65 | 130.6 MB | ##3 | 24%  2025-05-07T19:46:09.3417538Z 2025-05-07T19:46:09.4094411Z nsight-compute-2025. | 320.6 MB | ######5 | 65%  2025-05-07T19:46:09.4094759Z 2025-05-07T19:46:09.4094765Z 2025-05-07T19:46:09.4094769Z 2025-05-07T19:46:09.4094773Z 2025-05-07T19:46:09.4094777Z 2025-05-07T19:46:09.4094780Z 2025-05-07T19:46:09.4137122Z cuda-nsight-12.8.55 | 113.2 MB | ###5 | 36%  2025-05-07T19:46:09.4222576Z libcublas-12.8.3.14 | 460.2 MB | ####6 | 46% 2025-05-07T19:46:09.4222899Z 2025-05-07T19:46:09.4222906Z 2025-05-07T19:46:09.4222912Z 2025-05-07T19:46:09.4222917Z 2025-05-07T19:46:09.4222923Z 2025-05-07T19:46:09.4423490Z libnpp-12.3.3.65 | 130.6 MB | ##8 | 28%  2025-05-07T19:46:09.4423840Z 2025-05-07T19:46:09.5098156Z nsight-compute-2025. | 320.6 MB | ######6 | 67%  2025-05-07T19:46:09.5098525Z 2025-05-07T19:46:09.5098530Z 2025-05-07T19:46:09.5098535Z 2025-05-07T19:46:09.5098538Z 2025-05-07T19:46:09.5098542Z 2025-05-07T19:46:09.5098545Z 2025-05-07T19:46:09.5137372Z cuda-nsight-12.8.55 | 113.2 MB | ####1 | 42%  2025-05-07T19:46:09.5224918Z libcublas-12.8.3.14 | 460.2 MB | ####7 | 48% 2025-05-07T19:46:09.5225245Z 2025-05-07T19:46:09.5225250Z 2025-05-07T19:46:09.5225254Z 2025-05-07T19:46:09.5225258Z 2025-05-07T19:46:09.5225261Z 2025-05-07T19:46:09.5477998Z libnpp-12.3.3.65 | 130.6 MB | ###2 | 33%  2025-05-07T19:46:09.5478350Z 2025-05-07T19:46:09.6100853Z nsight-compute-2025. | 320.6 MB | ######8 | 68%  2025-05-07T19:46:09.6101696Z 2025-05-07T19:46:09.6101710Z 2025-05-07T19:46:09.6101743Z 2025-05-07T19:46:09.6101755Z 2025-05-07T19:46:09.6101765Z 2025-05-07T19:46:09.6101776Z 2025-05-07T19:46:09.6141315Z cuda-nsight-12.8.55 | 113.2 MB | ####7 | 47%  2025-05-07T19:46:09.6230645Z libcublas-12.8.3.14 | 460.2 MB | ####8 | 49% 2025-05-07T19:46:09.6231200Z 2025-05-07T19:46:09.6231217Z 2025-05-07T19:46:09.6231225Z 2025-05-07T19:46:09.6231233Z 2025-05-07T19:46:09.6231278Z 2025-05-07T19:46:09.6541290Z libnpp-12.3.3.65 | 130.6 MB | ###7 | 37%  2025-05-07T19:46:09.6541871Z 2025-05-07T19:46:09.7213788Z nsight-compute-2025. | 320.6 MB | ######9 | 70%  2025-05-07T19:46:09.7214136Z 2025-05-07T19:46:09.7214140Z 2025-05-07T19:46:09.7214144Z 2025-05-07T19:46:09.7214148Z 2025-05-07T19:46:09.7214151Z 2025-05-07T19:46:09.7214156Z 2025-05-07T19:46:09.7257868Z cuda-nsight-12.8.55 | 113.2 MB | #####2 | 53%  2025-05-07T19:46:09.7259664Z libcublas-12.8.3.14 | 460.2 MB | ####9 | 50% 2025-05-07T19:46:09.7260191Z 2025-05-07T19:46:09.7260213Z 2025-05-07T19:46:09.7260219Z 2025-05-07T19:46:09.7260228Z 2025-05-07T19:46:09.7262002Z 2025-05-07T19:46:09.7541660Z libnpp-12.3.3.65 | 130.6 MB | ####1 | 42%  2025-05-07T19:46:09.7542070Z 2025-05-07T19:46:09.8258491Z nsight-compute-2025. | 320.6 MB | #######1 | 71%  2025-05-07T19:46:09.8322453Z libcublas-12.8.3.14 | 460.2 MB | ##### | 51% 2025-05-07T19:46:09.8322863Z 2025-05-07T19:46:09.8322868Z 2025-05-07T19:46:09.8322872Z 2025-05-07T19:46:09.8322876Z 2025-05-07T19:46:09.8322880Z 2025-05-07T19:46:09.8322883Z 2025-05-07T19:46:09.8348453Z cuda-nsight-12.8.55 | 113.2 MB | #####8 | 58%  2025-05-07T19:46:09.8348869Z 2025-05-07T19:46:09.8348874Z 2025-05-07T19:46:09.8348877Z 2025-05-07T19:46:09.8348882Z 2025-05-07T19:46:09.8348894Z 2025-05-07T19:46:09.8545483Z libnpp-12.3.3.65 | 130.6 MB | ####6 | 46%  2025-05-07T19:46:09.8546832Z 2025-05-07T19:46:09.9323217Z nsight-compute-2025. | 320.6 MB | #######3 | 73%  2025-05-07T19:46:09.9324897Z libcublas-12.8.3.14 | 460.2 MB | #####2 | 52% 2025-05-07T19:46:09.9325664Z 2025-05-07T19:46:09.9325714Z 2025-05-07T19:46:09.9325726Z 2025-05-07T19:46:09.9325737Z 2025-05-07T19:46:09.9325747Z 2025-05-07T19:46:09.9325757Z 2025-05-07T19:46:09.9397032Z cuda-nsight-12.8.55 | 113.2 MB | ######3 | 64%  2025-05-07T19:46:09.9397510Z 2025-05-07T19:46:09.9397515Z 2025-05-07T19:46:09.9397519Z 2025-05-07T19:46:09.9397522Z 2025-05-07T19:46:09.9397526Z 2025-05-07T19:46:09.9548338Z libnpp-12.3.3.65 | 130.6 MB | ##### | 51%  2025-05-07T19:46:09.9549533Z 2025-05-07T19:46:10.0355241Z nsight-compute-2025. | 320.6 MB | #######4 | 75%  2025-05-07T19:46:10.0364194Z libcublas-12.8.3.14 | 460.2 MB | #####3 | 53% 2025-05-07T19:46:10.0364720Z 2025-05-07T19:46:10.0364731Z 2025-05-07T19:46:10.0364740Z 2025-05-07T19:46:10.0364763Z 2025-05-07T19:46:10.0364772Z 2025-05-07T19:46:10.0364782Z 2025-05-07T19:46:10.0445922Z cuda-nsight-12.8.55 | 113.2 MB | ######8 | 69%  2025-05-07T19:46:10.0446894Z 2025-05-07T19:46:10.0446908Z 2025-05-07T19:46:10.0446957Z 2025-05-07T19:46:10.0446969Z 2025-05-07T19:46:10.0446999Z 2025-05-07T19:46:10.0550804Z libnpp-12.3.3.65 | 130.6 MB | #####4 | 55%  2025-05-07T19:46:10.0551186Z 2025-05-07T19:46:10.1369888Z nsight-compute-2025. | 320.6 MB | #######6 | 77%  2025-05-07T19:46:10.1370249Z 2025-05-07T19:46:10.1370254Z 2025-05-07T19:46:10.1370258Z 2025-05-07T19:46:10.1370261Z 2025-05-07T19:46:10.1370265Z 2025-05-07T19:46:10.1370292Z 2025-05-07T19:46:10.1446218Z cuda-nsight-12.8.55 | 113.2 MB | #######4 | 74%  2025-05-07T19:46:10.1446630Z 2025-05-07T19:46:10.1446638Z 2025-05-07T19:46:10.1446643Z 2025-05-07T19:46:10.1446649Z 2025-05-07T19:46:10.1446652Z 2025-05-07T19:46:10.1486616Z libnpp-12.3.3.65 | 130.6 MB | #####9 | 59%  2025-05-07T19:46:10.1551882Z libcublas-12.8.3.14 | 460.2 MB | #####4 | 54% 2025-05-07T19:46:10.1552197Z 2025-05-07T19:46:10.2398429Z nsight-compute-2025. | 320.6 MB | #######8 | 78%  2025-05-07T19:46:10.2399740Z 2025-05-07T19:46:10.2399755Z 2025-05-07T19:46:10.2399766Z 2025-05-07T19:46:10.2399777Z 2025-05-07T19:46:10.2399789Z 2025-05-07T19:46:10.2399827Z 2025-05-07T19:46:10.2457441Z cuda-nsight-12.8.55 | 113.2 MB | #######9 | 79%  2025-05-07T19:46:10.2457780Z 2025-05-07T19:46:10.2457785Z 2025-05-07T19:46:10.2457789Z 2025-05-07T19:46:10.2457793Z 2025-05-07T19:46:10.2458070Z 2025-05-07T19:46:10.2496026Z libnpp-12.3.3.65 | 130.6 MB | ######3 | 63%  2025-05-07T19:46:10.2552972Z libcublas-12.8.3.14 | 460.2 MB | #####5 | 55% 2025-05-07T19:46:10.2553322Z 2025-05-07T19:46:10.3463195Z nsight-compute-2025. | 320.6 MB | ######## | 80%  2025-05-07T19:46:10.3463526Z 2025-05-07T19:46:10.3463547Z 2025-05-07T19:46:10.3463555Z 2025-05-07T19:46:10.3463563Z 2025-05-07T19:46:10.3463580Z 2025-05-07T19:46:10.3513213Z libnpp-12.3.3.65 | 130.6 MB | ######7 | 67%  2025-05-07T19:46:10.3516017Z libcublas-12.8.3.14 | 460.2 MB | #####6 | 56% 2025-05-07T19:46:10.3516315Z 2025-05-07T19:46:10.3516322Z 2025-05-07T19:46:10.3516327Z 2025-05-07T19:46:10.3516332Z 2025-05-07T19:46:10.3516337Z 2025-05-07T19:46:10.3516660Z 2025-05-07T19:46:10.3554261Z cuda-nsight-12.8.55 | 113.2 MB | ########4 | 85%  2025-05-07T19:46:10.3555243Z 2025-05-07T19:46:10.4464341Z nsight-compute-2025. | 320.6 MB | ########1 | 82%  2025-05-07T19:46:10.4465195Z 2025-05-07T19:46:10.4465209Z 2025-05-07T19:46:10.4465243Z 2025-05-07T19:46:10.4465255Z 2025-05-07T19:46:10.4465265Z 2025-05-07T19:46:10.4522229Z libnpp-12.3.3.65 | 130.6 MB | #######1 | 72%  2025-05-07T19:46:10.4522975Z libcublas-12.8.3.14 | 460.2 MB | #####7 | 57% 2025-05-07T19:46:10.4523443Z 2025-05-07T19:46:10.4523449Z 2025-05-07T19:46:10.4523457Z 2025-05-07T19:46:10.4523461Z 2025-05-07T19:46:10.4523465Z 2025-05-07T19:46:10.4523468Z 2025-05-07T19:46:10.4555280Z cuda-nsight-12.8.55 | 113.2 MB | ######### | 90%  2025-05-07T19:46:10.4556402Z 2025-05-07T19:46:10.4886098Z nsight-compute-2025. | 320.6 MB | ########3 | 84%  2025-05-07T19:46:10.4886470Z 2025-05-07T19:46:10.4886476Z 2025-05-07T19:46:10.5197158Z libcusparse-12.5.7.5 | 164.9 MB | ########## | 100%  2025-05-07T19:46:10.5198407Z 2025-05-07T19:46:10.5198412Z 2025-05-07T19:46:10.5198416Z 2025-05-07T19:46:10.5198420Z 2025-05-07T19:46:10.5198423Z 2025-05-07T19:46:10.5198427Z 2025-05-07T19:46:10.5198430Z 2025-05-07T19:46:10.5575488Z cuda-nvvp-12.8.57 | 112.4 MB | | 0%  2025-05-07T19:46:10.5576657Z 2025-05-07T19:46:10.5576671Z 2025-05-07T19:46:10.5576683Z 2025-05-07T19:46:10.5576694Z 2025-05-07T19:46:10.5576704Z 2025-05-07T19:46:10.5628424Z libnpp-12.3.3.65 | 130.6 MB | #######5 | 76%  2025-05-07T19:46:10.5631249Z libcublas-12.8.3.14 | 460.2 MB | #####8 | 58% 2025-05-07T19:46:10.5631773Z 2025-05-07T19:46:10.5631782Z 2025-05-07T19:46:10.5631820Z 2025-05-07T19:46:10.5631827Z 2025-05-07T19:46:10.5631836Z 2025-05-07T19:46:10.5632597Z 2025-05-07T19:46:10.5635505Z cuda-nsight-12.8.55 | 113.2 MB | #########5 | 95%  2025-05-07T19:46:10.5635876Z 2025-05-07T19:46:10.6196819Z nsight-compute-2025. | 320.6 MB | ########5 | 85%  2025-05-07T19:46:10.6197310Z 2025-05-07T19:46:10.6197320Z 2025-05-07T19:46:10.6197328Z 2025-05-07T19:46:10.6197336Z 2025-05-07T19:46:10.6197342Z 2025-05-07T19:46:10.6197348Z 2025-05-07T19:46:10.6197356Z 2025-05-07T19:46:10.6731506Z cuda-nvvp-12.8.57 | 112.4 MB | 3 | 3%  2025-05-07T19:46:10.6732499Z 2025-05-07T19:46:10.6732513Z 2025-05-07T19:46:10.6732525Z 2025-05-07T19:46:10.6732538Z 2025-05-07T19:46:10.6732549Z 2025-05-07T19:46:10.6757569Z libnpp-12.3.3.65 | 130.6 MB | ######## | 80%  2025-05-07T19:46:10.6895286Z libcublas-12.8.3.14 | 460.2 MB | #####9 | 59% 2025-05-07T19:46:10.6896109Z 2025-05-07T19:46:10.7196662Z nsight-compute-2025. | 320.6 MB | ########7 | 87%  2025-05-07T19:46:10.7197356Z 2025-05-07T19:46:10.7197362Z 2025-05-07T19:46:10.7197365Z 2025-05-07T19:46:10.7197369Z 2025-05-07T19:46:10.7197373Z 2025-05-07T19:46:10.7197378Z 2025-05-07T19:46:10.7197399Z 2025-05-07T19:46:10.7772832Z cuda-nvvp-12.8.57 | 112.4 MB | 7 | 7%  2025-05-07T19:46:10.7773276Z 2025-05-07T19:46:10.7773281Z 2025-05-07T19:46:10.7773285Z 2025-05-07T19:46:10.7773289Z 2025-05-07T19:46:10.7773293Z 2025-05-07T19:46:10.7780066Z libnpp-12.3.3.65 | 130.6 MB | ########3 | 84%  2025-05-07T19:46:10.7901681Z libcublas-12.8.3.14 | 460.2 MB | ###### | 60% 2025-05-07T19:46:10.7902487Z 2025-05-07T19:46:10.8202248Z nsight-compute-2025. | 320.6 MB | ########8 | 89%  2025-05-07T19:46:10.8203102Z 2025-05-07T19:46:10.8203116Z 2025-05-07T19:46:10.8203128Z 2025-05-07T19:46:10.8203160Z 2025-05-07T19:46:10.8203171Z 2025-05-07T19:46:10.8203182Z 2025-05-07T19:46:10.8203193Z 2025-05-07T19:46:10.8776334Z cuda-nvvp-12.8.57 | 112.4 MB | #1 | 12%  2025-05-07T19:46:10.8776688Z 2025-05-07T19:46:10.8776693Z 2025-05-07T19:46:10.8776698Z 2025-05-07T19:46:10.8776701Z 2025-05-07T19:46:10.8777073Z 2025-05-07T19:46:10.8780915Z libnpp-12.3.3.65 | 130.6 MB | ########7 | 88%  2025-05-07T19:46:10.8903341Z libcublas-12.8.3.14 | 460.2 MB | ######1 | 61% 2025-05-07T19:46:10.8904499Z 2025-05-07T19:46:10.9203860Z nsight-compute-2025. | 320.6 MB | ######### | 90%  2025-05-07T19:46:10.9204204Z 2025-05-07T19:46:10.9204209Z 2025-05-07T19:46:10.9204213Z 2025-05-07T19:46:10.9204217Z 2025-05-07T19:46:10.9204221Z 2025-05-07T19:46:10.9204225Z 2025-05-07T19:46:10.9204229Z 2025-05-07T19:46:10.9785449Z cuda-nvvp-12.8.57 | 112.4 MB | #5 | 16%  2025-05-07T19:46:10.9831167Z libcublas-12.8.3.14 | 460.2 MB | ######2 | 62% 2025-05-07T19:46:10.9831492Z 2025-05-07T19:46:10.9831499Z 2025-05-07T19:46:10.9831547Z 2025-05-07T19:46:10.9831553Z 2025-05-07T19:46:10.9831567Z 2025-05-07T19:46:10.9913747Z libnpp-12.3.3.65 | 130.6 MB | #########1 | 92%  2025-05-07T19:46:10.9914074Z 2025-05-07T19:46:11.0208119Z nsight-compute-2025. | 320.6 MB | #########1 | 92%  2025-05-07T19:46:11.0208474Z 2025-05-07T19:46:11.0208480Z 2025-05-07T19:46:11.0208484Z 2025-05-07T19:46:11.0208488Z 2025-05-07T19:46:11.0208491Z 2025-05-07T19:46:11.0208495Z 2025-05-07T19:46:11.0208499Z 2025-05-07T19:46:11.0857927Z cuda-nvvp-12.8.57 | 112.4 MB | #9 | 20%  2025-05-07T19:46:11.0914105Z libcublas-12.8.3.14 | 460.2 MB | ######3 | 63% 2025-05-07T19:46:11.0914430Z 2025-05-07T19:46:11.0914556Z 2025-05-07T19:46:11.0914562Z 2025-05-07T19:46:11.0914616Z 2025-05-07T19:46:11.0914805Z 2025-05-07T19:46:11.0937699Z libnpp-12.3.3.65 | 130.6 MB | #########5 | 96%  2025-05-07T19:46:11.1210847Z 2025-05-07T19:46:11.1211370Z nsight-compute-2025. | 320.6 MB | #########3 | 93%  2025-05-07T19:46:11.1211712Z 2025-05-07T19:46:11.1211718Z 2025-05-07T19:46:11.1211723Z 2025-05-07T19:46:11.1211728Z 2025-05-07T19:46:11.1211733Z 2025-05-07T19:46:11.1211738Z 2025-05-07T19:46:11.1211741Z 2025-05-07T19:46:11.1858005Z cuda-nvvp-12.8.57 | 112.4 MB | ##4 | 24%  2025-05-07T19:46:11.1937441Z libcublas-12.8.3.14 | 460.2 MB | ######4 | 64% 2025-05-07T19:46:11.1937750Z 2025-05-07T19:46:11.1972068Z nsight-compute-2025. | 320.6 MB | #########5 | 95%  2025-05-07T19:46:11.1972365Z 2025-05-07T19:46:11.1972636Z 2025-05-07T19:46:11.1972645Z 2025-05-07T19:46:11.1972650Z 2025-05-07T19:46:11.1972655Z 2025-05-07T19:46:11.2212148Z libnpp-12.3.3.65 | 130.6 MB | #########9 | 99%  2025-05-07T19:46:11.2212462Z 2025-05-07T19:46:11.2212468Z 2025-05-07T19:46:11.2212472Z 2025-05-07T19:46:11.2212478Z 2025-05-07T19:46:11.2212496Z 2025-05-07T19:46:11.2212500Z 2025-05-07T19:46:11.2212503Z 2025-05-07T19:46:11.2859096Z cuda-nvvp-12.8.57 | 112.4 MB | ##8 | 29%  2025-05-07T19:46:11.2939503Z libcublas-12.8.3.14 | 460.2 MB | ######5 | 65% 2025-05-07T19:46:11.2944035Z 2025-05-07T19:46:11.3213329Z nsight-compute-2025. | 320.6 MB | #########6 | 97%  2025-05-07T19:46:11.3214184Z 2025-05-07T19:46:11.3214228Z 2025-05-07T19:46:11.3214240Z 2025-05-07T19:46:11.3214252Z 2025-05-07T19:46:11.3214264Z 2025-05-07T19:46:11.3214307Z 2025-05-07T19:46:11.3214318Z 2025-05-07T19:46:11.3859419Z cuda-nvvp-12.8.57 | 112.4 MB | ###4 | 34%  2025-05-07T19:46:11.3941219Z libcublas-12.8.3.14 | 460.2 MB | ######6 | 66% 2025-05-07T19:46:11.3942027Z 2025-05-07T19:46:11.4212624Z nsight-compute-2025. | 320.6 MB | #########8 | 99%  2025-05-07T19:46:11.4212973Z 2025-05-07T19:46:11.4212979Z 2025-05-07T19:46:11.4212985Z 2025-05-07T19:46:11.4212992Z 2025-05-07T19:46:11.4212998Z 2025-05-07T19:46:11.4213007Z 2025-05-07T19:46:11.4213013Z 2025-05-07T19:46:11.4862678Z cuda-nvvp-12.8.57 | 112.4 MB | ###9 | 40%  2025-05-07T19:46:11.5214701Z libcublas-12.8.3.14 | 460.2 MB | ######7 | 67% 2025-05-07T19:46:11.5214995Z 2025-05-07T19:46:11.5215001Z 2025-05-07T19:46:11.5215005Z 2025-05-07T19:46:11.5215376Z 2025-05-07T19:46:11.5215386Z 2025-05-07T19:46:11.5215391Z 2025-05-07T19:46:11.5215395Z 2025-05-07T19:46:11.5864867Z cuda-nvvp-12.8.57 | 112.4 MB | ####5 | 45%  2025-05-07T19:46:11.6215098Z libcublas-12.8.3.14 | 460.2 MB | ######8 | 69% 2025-05-07T19:46:11.6215644Z 2025-05-07T19:46:11.6215658Z 2025-05-07T19:46:11.6215683Z 2025-05-07T19:46:11.6215689Z 2025-05-07T19:46:11.6215696Z 2025-05-07T19:46:11.6215701Z 2025-05-07T19:46:11.6215706Z 2025-05-07T19:46:11.6872601Z cuda-nvvp-12.8.57 | 112.4 MB | #####2 | 52%  2025-05-07T19:46:11.7215322Z libcublas-12.8.3.14 | 460.2 MB | ####### | 70% 2025-05-07T19:46:11.7215642Z 2025-05-07T19:46:11.7215923Z 2025-05-07T19:46:11.7215975Z 2025-05-07T19:46:11.7215983Z 2025-05-07T19:46:11.7215990Z 2025-05-07T19:46:11.7215996Z 2025-05-07T19:46:11.7216002Z 2025-05-07T19:46:11.7875611Z cuda-nvvp-12.8.57 | 112.4 MB | #####9 | 59%  2025-05-07T19:46:11.8218031Z libcublas-12.8.3.14 | 460.2 MB | #######2 | 72% 2025-05-07T19:46:11.8218329Z 2025-05-07T19:46:11.8218334Z 2025-05-07T19:46:11.8218338Z 2025-05-07T19:46:11.8218342Z 2025-05-07T19:46:11.8218345Z 2025-05-07T19:46:11.8218348Z 2025-05-07T19:46:11.8218352Z 2025-05-07T19:46:11.8876640Z cuda-nvvp-12.8.57 | 112.4 MB | ######6 | 66%  2025-05-07T19:46:11.9218790Z libcublas-12.8.3.14 | 460.2 MB | #######3 | 74% 2025-05-07T19:46:11.9219104Z 2025-05-07T19:46:11.9219109Z 2025-05-07T19:46:11.9219114Z 2025-05-07T19:46:11.9219118Z 2025-05-07T19:46:11.9219122Z 2025-05-07T19:46:11.9219125Z 2025-05-07T19:46:11.9219129Z 2025-05-07T19:46:11.9878231Z cuda-nvvp-12.8.57 | 112.4 MB | #######2 | 73%  2025-05-07T19:46:12.0219881Z libcublas-12.8.3.14 | 460.2 MB | #######5 | 75% 2025-05-07T19:46:12.0220175Z 2025-05-07T19:46:12.0220180Z 2025-05-07T19:46:12.0220183Z 2025-05-07T19:46:12.0220187Z 2025-05-07T19:46:12.0220191Z 2025-05-07T19:46:12.0220218Z 2025-05-07T19:46:12.0220236Z 2025-05-07T19:46:12.0882123Z cuda-nvvp-12.8.57 | 112.4 MB | #######9 | 80%  2025-05-07T19:46:12.1220270Z libcublas-12.8.3.14 | 460.2 MB | #######6 | 77% 2025-05-07T19:46:12.1220561Z 2025-05-07T19:46:12.1220567Z 2025-05-07T19:46:12.1220571Z 2025-05-07T19:46:12.1220574Z 2025-05-07T19:46:12.1220578Z 2025-05-07T19:46:12.1220582Z 2025-05-07T19:46:12.1220720Z 2025-05-07T19:46:12.1882293Z cuda-nvvp-12.8.57 | 112.4 MB | ########6 | 86%  2025-05-07T19:46:12.2221754Z libcublas-12.8.3.14 | 460.2 MB | #######8 | 78% 2025-05-07T19:46:12.2222056Z 2025-05-07T19:46:12.2222061Z 2025-05-07T19:46:12.2222065Z 2025-05-07T19:46:12.2222069Z 2025-05-07T19:46:12.2222325Z 2025-05-07T19:46:12.2222330Z 2025-05-07T19:46:12.2222350Z 2025-05-07T19:46:12.2881539Z cuda-nvvp-12.8.57 | 112.4 MB | #########3 | 93%  2025-05-07T19:46:12.3883387Z libcublas-12.8.3.14 | 460.2 MB | #######9 | 80% 2025-05-07T19:46:12.4241648Z libcublas-12.8.3.14 | 460.2 MB | ########1 | 82% 2025-05-07T19:46:12.4242033Z 2025-05-07T19:46:12.4242124Z 2025-05-07T19:46:12.4242138Z 2025-05-07T19:46:12.4242157Z 2025-05-07T19:46:12.4883576Z libcufft-11.3.3.41 | 147.4 MB | ########## | 100%  2025-05-07T19:46:12.5310805Z libcublas-12.8.3.14 | 460.2 MB | ########3 | 84% 2025-05-07T19:46:12.5311176Z 2025-05-07T19:46:12.5311228Z 2025-05-07T19:46:12.5311232Z 2025-05-07T19:46:12.5311521Z 2025-05-07T19:46:12.5311541Z 2025-05-07T19:46:12.5311551Z 2025-05-07T19:46:12.5745317Z cuda-nsight-12.8.55 | 113.2 MB | ########## | 100%  2025-05-07T19:46:12.5745739Z 2025-05-07T19:46:12.5745748Z 2025-05-07T19:46:12.5745754Z 2025-05-07T19:46:12.5745830Z 2025-05-07T19:46:12.5745835Z 2025-05-07T19:46:12.5745840Z 2025-05-07T19:46:12.5745845Z 2025-05-07T19:46:12.5745850Z 2025-05-07T19:46:12.5881416Z cuda-nvrtc-12.8.61 | 63.1 MB | | 0%  2025-05-07T19:46:12.6745907Z libcublas-12.8.3.14 | 460.2 MB | ########5 | 86% 2025-05-07T19:46:12.6746243Z 2025-05-07T19:46:12.6746248Z 2025-05-07T19:46:12.6746252Z 2025-05-07T19:46:12.6746255Z 2025-05-07T19:46:12.6746259Z 2025-05-07T19:46:12.6746263Z 2025-05-07T19:46:12.6746266Z 2025-05-07T19:46:12.6746269Z 2025-05-07T19:46:12.7196631Z cuda-nvrtc-12.8.61 | 63.1 MB | #3 | 14%  2025-05-07T19:46:12.7745455Z libcublas-12.8.3.14 | 460.2 MB | ########7 | 88% 2025-05-07T19:46:12.7745762Z 2025-05-07T19:46:12.7745768Z 2025-05-07T19:46:12.7745772Z 2025-05-07T19:46:12.7745776Z 2025-05-07T19:46:12.7745782Z 2025-05-07T19:46:12.7745786Z 2025-05-07T19:46:12.7745791Z 2025-05-07T19:46:12.7745797Z 2025-05-07T19:46:12.8258957Z cuda-nvrtc-12.8.61 | 63.1 MB | ##6 | 26%  2025-05-07T19:46:12.8746350Z libcublas-12.8.3.14 | 460.2 MB | ########9 | 90% 2025-05-07T19:46:12.8746748Z 2025-05-07T19:46:12.8746754Z 2025-05-07T19:46:12.8746791Z 2025-05-07T19:46:12.8746794Z 2025-05-07T19:46:12.8746799Z 2025-05-07T19:46:12.8746802Z 2025-05-07T19:46:12.8746806Z 2025-05-07T19:46:12.8746809Z 2025-05-07T19:46:12.9372321Z cuda-nvrtc-12.8.61 | 63.1 MB | ###8 | 39%  2025-05-07T19:46:12.9746641Z libcublas-12.8.3.14 | 460.2 MB | #########1 | 91% 2025-05-07T19:46:12.9746949Z 2025-05-07T19:46:12.9746955Z 2025-05-07T19:46:12.9746959Z 2025-05-07T19:46:12.9746963Z 2025-05-07T19:46:12.9746966Z 2025-05-07T19:46:12.9746993Z 2025-05-07T19:46:12.9746998Z 2025-05-07T19:46:12.9747002Z 2025-05-07T19:46:13.0434876Z cuda-nvrtc-12.8.61 | 63.1 MB | ##### | 50%  2025-05-07T19:46:13.0747463Z libcublas-12.8.3.14 | 460.2 MB | #########2 | 93% 2025-05-07T19:46:13.0747833Z 2025-05-07T19:46:13.0747839Z 2025-05-07T19:46:13.0747845Z 2025-05-07T19:46:13.0747851Z 2025-05-07T19:46:13.0747855Z 2025-05-07T19:46:13.0747859Z 2025-05-07T19:46:13.0747863Z 2025-05-07T19:46:13.0747985Z 2025-05-07T19:46:13.1452511Z cuda-nvrtc-12.8.61 | 63.1 MB | ######2 | 63%  2025-05-07T19:46:13.1749710Z libcublas-12.8.3.14 | 460.2 MB | #########4 | 95% 2025-05-07T19:46:13.1750285Z 2025-05-07T19:46:13.1750302Z 2025-05-07T19:46:13.1750309Z 2025-05-07T19:46:13.1750315Z 2025-05-07T19:46:13.1750321Z 2025-05-07T19:46:13.1750325Z 2025-05-07T19:46:13.1750331Z 2025-05-07T19:46:13.1750336Z 2025-05-07T19:46:13.2038208Z cuda-nvrtc-12.8.61 | 63.1 MB | #######4 | 75%  2025-05-07T19:46:13.2038659Z 2025-05-07T19:46:13.2038665Z 2025-05-07T19:46:13.2038672Z 2025-05-07T19:46:13.2038678Z 2025-05-07T19:46:13.2038684Z 2025-05-07T19:46:13.2449730Z libnpp-12.3.3.65 | 130.6 MB | ########## | 100%  2025-05-07T19:46:13.2749099Z libcublas-12.8.3.14 | 460.2 MB | #########6 | 96% 2025-05-07T19:46:13.2749436Z 2025-05-07T19:46:13.2749464Z 2025-05-07T19:46:13.2749468Z 2025-05-07T19:46:13.2749473Z 2025-05-07T19:46:13.2749478Z 2025-05-07T19:46:13.2749513Z 2025-05-07T19:46:13.2749518Z 2025-05-07T19:46:13.2749524Z 2025-05-07T19:46:13.2846819Z cuda-nvrtc-12.8.61 | 63.1 MB | ########7 | 88%  2025-05-07T19:46:13.2847213Z 2025-05-07T19:46:13.2847218Z 2025-05-07T19:46:13.2847221Z 2025-05-07T19:46:13.2847225Z 2025-05-07T19:46:13.2847229Z 2025-05-07T19:46:13.2847232Z 2025-05-07T19:46:13.2847237Z 2025-05-07T19:46:13.2847242Z 2025-05-07T19:46:13.2847245Z 2025-05-07T19:46:13.3589618Z libcurand-10.3.9.55 | 43.6 MB | | 0%  2025-05-07T19:46:13.3836465Z libcublas-12.8.3.14 | 460.2 MB | #########7 | 98% 2025-05-07T19:46:13.3836783Z 2025-05-07T19:46:13.3836789Z 2025-05-07T19:46:13.3836792Z 2025-05-07T19:46:13.3836796Z 2025-05-07T19:46:13.3836862Z 2025-05-07T19:46:13.3836866Z 2025-05-07T19:46:13.3836869Z 2025-05-07T19:46:13.3836875Z 2025-05-07T19:46:13.3854382Z cuda-nvrtc-12.8.61 | 63.1 MB | #########9 | 100%  2025-05-07T19:46:13.3856086Z 2025-05-07T19:46:13.3856106Z 2025-05-07T19:46:13.3856118Z 2025-05-07T19:46:13.3856160Z 2025-05-07T19:46:13.3856172Z 2025-05-07T19:46:13.3856182Z 2025-05-07T19:46:13.3856192Z 2025-05-07T19:46:13.3856203Z 2025-05-07T19:46:13.3856213Z 2025-05-07T19:46:13.4597477Z libcurand-10.3.9.55 | 43.6 MB | # | 10%  2025-05-07T19:46:13.4853690Z libcublas-12.8.3.14 | 460.2 MB | #########9 | 99% 2025-05-07T19:46:13.4853993Z 2025-05-07T19:46:13.4854285Z 2025-05-07T19:46:13.4854302Z 2025-05-07T19:46:13.4854310Z 2025-05-07T19:46:13.4854318Z 2025-05-07T19:46:13.4854326Z 2025-05-07T19:46:13.4854332Z 2025-05-07T19:46:13.4854338Z 2025-05-07T19:46:13.4854345Z 2025-05-07T19:46:13.5855497Z libcurand-10.3.9.55 | 43.6 MB | ##3 | 23%  2025-05-07T19:46:13.5855885Z 2025-05-07T19:46:13.5855891Z 2025-05-07T19:46:13.5855895Z 2025-05-07T19:46:13.5855943Z 2025-05-07T19:46:13.5855948Z 2025-05-07T19:46:13.5855954Z 2025-05-07T19:46:13.5855959Z 2025-05-07T19:46:13.5856007Z 2025-05-07T19:46:13.5856011Z 2025-05-07T19:46:13.6856760Z libcurand-10.3.9.55 | 43.6 MB | ####6 | 47%  2025-05-07T19:46:13.6857148Z 2025-05-07T19:46:13.6857154Z 2025-05-07T19:46:13.6857160Z 2025-05-07T19:46:13.6857164Z 2025-05-07T19:46:13.6857168Z 2025-05-07T19:46:13.6857172Z 2025-05-07T19:46:13.6857197Z 2025-05-07T19:46:13.6857203Z 2025-05-07T19:46:13.6857210Z 2025-05-07T19:46:13.7385039Z libcurand-10.3.9.55 | 43.6 MB | ######7 | 67%  2025-05-07T19:46:13.7385390Z 2025-05-07T19:46:13.7385415Z 2025-05-07T19:46:13.7385419Z 2025-05-07T19:46:13.7385425Z 2025-05-07T19:46:13.7385429Z 2025-05-07T19:46:13.7385448Z 2025-05-07T19:46:13.7975165Z cuda-nsight-12.8.55 | 113.2 MB | ########## | 100%  2025-05-07T19:46:13.7975565Z 2025-05-07T19:46:13.7975570Z 2025-05-07T19:46:13.7975578Z 2025-05-07T19:46:13.7975582Z 2025-05-07T19:46:13.7975587Z 2025-05-07T19:46:13.7975591Z 2025-05-07T19:46:13.7975615Z 2025-05-07T19:46:13.7975619Z 2025-05-07T19:46:13.7975622Z 2025-05-07T19:46:13.8775994Z libcurand-10.3.9.55 | 43.6 MB | ########3 | 84%  2025-05-07T19:46:13.8776350Z 2025-05-07T19:46:13.8776355Z 2025-05-07T19:46:13.8776360Z 2025-05-07T19:46:13.8776364Z 2025-05-07T19:46:13.8776367Z 2025-05-07T19:46:13.8776371Z 2025-05-07T19:46:13.8776375Z 2025-05-07T19:46:13.9095122Z cuda-nvvp-12.8.57 | 112.4 MB | ########## | 100%  2025-05-07T19:46:13.9095475Z 2025-05-07T19:46:13.9095483Z 2025-05-07T19:46:13.9095487Z 2025-05-07T19:46:13.9095491Z 2025-05-07T19:46:13.9095495Z 2025-05-07T19:46:13.9095499Z 2025-05-07T19:46:13.9095502Z 2025-05-07T19:46:13.9095506Z 2025-05-07T19:46:13.9095509Z 2025-05-07T19:46:13.9095770Z 2025-05-07T19:46:14.0096224Z gds-tools-1.13.0.11 | 37.9 MB | | 0%  2025-05-07T19:46:14.0096577Z 2025-05-07T19:46:14.0096582Z 2025-05-07T19:46:14.0096588Z 2025-05-07T19:46:14.0096625Z 2025-05-07T19:46:14.0096629Z 2025-05-07T19:46:14.0096633Z 2025-05-07T19:46:14.0096637Z 2025-05-07T19:46:14.0096641Z 2025-05-07T19:46:14.0096644Z 2025-05-07T19:46:14.0096647Z 2025-05-07T19:46:14.0342442Z gds-tools-1.13.0.11 | 37.9 MB | ##6 | 27%  2025-05-07T19:46:14.0342793Z 2025-05-07T19:46:14.0342798Z 2025-05-07T19:46:14.0343072Z 2025-05-07T19:46:14.1097509Z libcusolver-11.7.2.5 | 156.9 MB | ########## | 100%  2025-05-07T19:46:14.1097873Z 2025-05-07T19:46:14.1097878Z 2025-05-07T19:46:14.1097883Z 2025-05-07T19:46:14.1097888Z 2025-05-07T19:46:14.1097893Z 2025-05-07T19:46:14.1097897Z 2025-05-07T19:46:14.1097902Z 2025-05-07T19:46:14.1097906Z 2025-05-07T19:46:14.1097912Z 2025-05-07T19:46:14.1097952Z 2025-05-07T19:46:14.1466752Z gds-tools-1.13.0.11 | 37.9 MB | #####3 | 53%  2025-05-07T19:46:14.1467104Z 2025-05-07T19:46:14.1467112Z 2025-05-07T19:46:14.1467117Z 2025-05-07T19:46:14.1467121Z 2025-05-07T19:46:14.1467382Z 2025-05-07T19:46:14.1467388Z 2025-05-07T19:46:14.1467392Z 2025-05-07T19:46:14.1467395Z 2025-05-07T19:46:14.1900974Z cuda-nvrtc-12.8.61 | 63.1 MB | ########## | 100%  2025-05-07T19:46:14.1901298Z 2025-05-07T19:46:14.1901304Z 2025-05-07T19:46:14.1901310Z 2025-05-07T19:46:14.1901315Z 2025-05-07T19:46:14.1901319Z 2025-05-07T19:46:14.1901326Z 2025-05-07T19:46:14.1901330Z 2025-05-07T19:46:14.1901335Z 2025-05-07T19:46:14.1901344Z 2025-05-07T19:46:14.1901349Z 2025-05-07T19:46:14.1901353Z 2025-05-07T19:46:14.2097166Z libnvjitlink-12.8.61 | 28.7 MB | | 0%  2025-05-07T19:46:14.2097522Z 2025-05-07T19:46:14.2097527Z 2025-05-07T19:46:14.2097532Z 2025-05-07T19:46:14.2097568Z 2025-05-07T19:46:14.2097572Z 2025-05-07T19:46:14.2097576Z 2025-05-07T19:46:14.2097581Z 2025-05-07T19:46:14.2097585Z 2025-05-07T19:46:14.2097590Z 2025-05-07T19:46:14.2097594Z 2025-05-07T19:46:14.2904963Z gds-tools-1.13.0.11 | 37.9 MB | #######5 | 75%  2025-05-07T19:46:14.2905308Z 2025-05-07T19:46:14.2905313Z 2025-05-07T19:46:14.2905317Z 2025-05-07T19:46:14.2905321Z 2025-05-07T19:46:14.2905324Z 2025-05-07T19:46:14.2905327Z 2025-05-07T19:46:14.2905331Z 2025-05-07T19:46:14.2905334Z 2025-05-07T19:46:14.2905337Z 2025-05-07T19:46:14.2905340Z 2025-05-07T19:46:14.2905344Z 2025-05-07T19:46:14.3462660Z libnvjitlink-12.8.61 | 28.7 MB | ##3 | 24%  2025-05-07T19:46:14.3463004Z 2025-05-07T19:46:14.3463009Z 2025-05-07T19:46:14.3463014Z 2025-05-07T19:46:14.3463019Z 2025-05-07T19:46:14.3463024Z 2025-05-07T19:46:14.3463029Z 2025-05-07T19:46:14.3463034Z 2025-05-07T19:46:14.3463041Z 2025-05-07T19:46:14.3463046Z 2025-05-07T19:46:14.3463100Z 2025-05-07T19:46:14.3905814Z gds-tools-1.13.0.11 | 37.9 MB | #########6 | 96%  2025-05-07T19:46:14.3906157Z 2025-05-07T19:46:14.3906162Z 2025-05-07T19:46:14.3906167Z 2025-05-07T19:46:14.3906195Z 2025-05-07T19:46:14.3906198Z 2025-05-07T19:46:14.3906202Z 2025-05-07T19:46:14.3906206Z 2025-05-07T19:46:14.3906223Z 2025-05-07T19:46:14.3906227Z 2025-05-07T19:46:14.3906230Z 2025-05-07T19:46:14.3906233Z 2025-05-07T19:46:14.4598671Z libnvjitlink-12.8.61 | 28.7 MB | ####8 | 49%  2025-05-07T19:46:14.4599105Z 2025-05-07T19:46:14.4599112Z 2025-05-07T19:46:14.4599118Z 2025-05-07T19:46:14.4599124Z 2025-05-07T19:46:14.4599128Z 2025-05-07T19:46:14.4599133Z 2025-05-07T19:46:14.4599139Z 2025-05-07T19:46:14.4599144Z 2025-05-07T19:46:14.4599175Z 2025-05-07T19:46:14.4905920Z libcurand-10.3.9.55 | 43.6 MB | ########## | 100%  2025-05-07T19:46:14.4906274Z 2025-05-07T19:46:14.4906284Z 2025-05-07T19:46:14.4906570Z 2025-05-07T19:46:14.4906575Z 2025-05-07T19:46:14.4906580Z 2025-05-07T19:46:14.4906584Z 2025-05-07T19:46:14.4906589Z 2025-05-07T19:46:14.4906620Z 2025-05-07T19:46:14.4906625Z 2025-05-07T19:46:14.4906630Z 2025-05-07T19:46:14.4906655Z 2025-05-07T19:46:14.4989514Z libnvjitlink-12.8.61 | 28.7 MB | #######8 | 78%  2025-05-07T19:46:14.4989874Z 2025-05-07T19:46:14.4992561Z 2025-05-07T19:46:14.4992577Z 2025-05-07T19:46:14.4992585Z 2025-05-07T19:46:14.4992591Z 2025-05-07T19:46:14.4992596Z 2025-05-07T19:46:14.4992601Z 2025-05-07T19:46:14.4992605Z 2025-05-07T19:46:14.4992609Z 2025-05-07T19:46:14.4992616Z 2025-05-07T19:46:14.4992622Z 2025-05-07T19:46:14.4992627Z 2025-05-07T19:46:14.5992392Z cuda-nvcc-tools-12.8 | 24.5 MB | | 0%  2025-05-07T19:46:14.5992780Z 2025-05-07T19:46:14.5992786Z 2025-05-07T19:46:14.5992791Z 2025-05-07T19:46:14.5992796Z 2025-05-07T19:46:14.5992804Z 2025-05-07T19:46:14.5992845Z 2025-05-07T19:46:14.5992850Z 2025-05-07T19:46:14.5992854Z 2025-05-07T19:46:14.5992873Z 2025-05-07T19:46:14.5992878Z 2025-05-07T19:46:14.5992882Z 2025-05-07T19:46:14.6993036Z 2025-05-07T19:46:14.6994099Z cuda-nvcc-tools-12.8 | 24.5 MB | ###2 | 32%  2025-05-07T19:46:14.6994477Z 2025-05-07T19:46:14.6994500Z 2025-05-07T19:46:14.6994505Z 2025-05-07T19:46:14.6994521Z 2025-05-07T19:46:14.6994526Z 2025-05-07T19:46:14.6994530Z 2025-05-07T19:46:14.6994534Z 2025-05-07T19:46:14.6994538Z 2025-05-07T19:46:14.6994542Z 2025-05-07T19:46:14.6994545Z 2025-05-07T19:46:14.6994549Z 2025-05-07T19:46:14.6994552Z 2025-05-07T19:46:14.9358574Z cuda-nvcc-tools-12.8 | 24.5 MB | ######8 | 68%  2025-05-07T19:46:14.9358954Z 2025-05-07T19:46:14.9358960Z 2025-05-07T19:46:14.9358968Z 2025-05-07T19:46:14.9359008Z 2025-05-07T19:46:14.9359013Z 2025-05-07T19:46:14.9359018Z 2025-05-07T19:46:14.9359023Z 2025-05-07T19:46:14.9359063Z 2025-05-07T19:46:14.9359067Z 2025-05-07T19:46:14.9359071Z 2025-05-07T19:46:14.9693656Z gds-tools-1.13.0.11 | 37.9 MB | ########## | 100%  2025-05-07T19:46:14.9694012Z 2025-05-07T19:46:14.9694046Z 2025-05-07T19:46:14.9694050Z 2025-05-07T19:46:14.9694054Z 2025-05-07T19:46:14.9694058Z 2025-05-07T19:46:14.9694062Z 2025-05-07T19:46:14.9694065Z 2025-05-07T19:46:14.9694069Z 2025-05-07T19:46:14.9694072Z 2025-05-07T19:46:14.9694075Z 2025-05-07T19:46:14.9694079Z 2025-05-07T19:46:14.9694082Z 2025-05-07T19:46:14.9694085Z 2025-05-07T19:46:14.9888372Z cuda-nvvm-tools-12.8 | 23.5 MB | | 0%  2025-05-07T19:46:14.9888736Z 2025-05-07T19:46:14.9888744Z 2025-05-07T19:46:15.0293764Z libcusparse-12.5.7.5 | 164.9 MB | ########## | 100%  2025-05-07T19:46:15.0294080Z 2025-05-07T19:46:15.0294101Z 2025-05-07T19:46:15.0294105Z 2025-05-07T19:46:15.0294109Z 2025-05-07T19:46:15.0294113Z 2025-05-07T19:46:15.0294142Z 2025-05-07T19:46:15.0294146Z 2025-05-07T19:46:15.0294149Z 2025-05-07T19:46:15.0294153Z 2025-05-07T19:46:15.0294156Z 2025-05-07T19:46:15.0294160Z 2025-05-07T19:46:15.0703643Z libnvjitlink-12.8.61 | 28.7 MB | ########## | 100%  2025-05-07T19:46:15.0704014Z 2025-05-07T19:46:15.0704019Z 2025-05-07T19:46:15.0704023Z 2025-05-07T19:46:15.0704027Z 2025-05-07T19:46:15.0704031Z 2025-05-07T19:46:15.0704034Z 2025-05-07T19:46:15.0704038Z 2025-05-07T19:46:15.0704041Z 2025-05-07T19:46:15.0704045Z 2025-05-07T19:46:15.0704048Z 2025-05-07T19:46:15.0704051Z 2025-05-07T19:46:15.0704054Z 2025-05-07T19:46:15.0704058Z 2025-05-07T19:46:15.0822569Z cuda-nvvm-tools-12.8 | 23.5 MB | ###4 | 34%  2025-05-07T19:46:15.0822945Z 2025-05-07T19:46:15.0822950Z 2025-05-07T19:46:15.0822954Z 2025-05-07T19:46:15.0822958Z 2025-05-07T19:46:15.0822961Z 2025-05-07T19:46:15.0822965Z 2025-05-07T19:46:15.0822968Z 2025-05-07T19:46:15.0822972Z 2025-05-07T19:46:15.0823222Z 2025-05-07T19:46:15.0823226Z 2025-05-07T19:46:15.0823229Z 2025-05-07T19:46:15.0823233Z 2025-05-07T19:46:15.0823236Z 2025-05-07T19:46:15.0823239Z 2025-05-07T19:46:15.1025553Z cuda-nvvm-impl-12.8. | 20.8 MB | | 0%  2025-05-07T19:46:15.1026593Z 2025-05-07T19:46:15.1026607Z 2025-05-07T19:46:15.1026618Z 2025-05-07T19:46:15.1026628Z 2025-05-07T19:46:15.1026639Z 2025-05-07T19:46:15.1026649Z 2025-05-07T19:46:15.1026660Z 2025-05-07T19:46:15.1026670Z 2025-05-07T19:46:15.1026680Z 2025-05-07T19:46:15.1026712Z 2025-05-07T19:46:15.1026723Z 2025-05-07T19:46:15.1026734Z 2025-05-07T19:46:15.1027606Z cuda-nvcc-tools-12.8 | 24.5 MB | ########## | 100%  2025-05-07T19:46:15.1028539Z 2025-05-07T19:46:15.1028550Z 2025-05-07T19:46:15.1028561Z 2025-05-07T19:46:15.1028571Z 2025-05-07T19:46:15.1028581Z 2025-05-07T19:46:15.1028591Z 2025-05-07T19:46:15.1028601Z 2025-05-07T19:46:15.1028631Z 2025-05-07T19:46:15.1028663Z 2025-05-07T19:46:15.1028673Z 2025-05-07T19:46:15.1028684Z 2025-05-07T19:46:15.1028695Z 2025-05-07T19:46:15.1519530Z cuda-nvcc-tools-12.8 | 24.5 MB | ########## | 100%  2025-05-07T19:46:15.1520129Z 2025-05-07T19:46:15.1520135Z 2025-05-07T19:46:15.1520139Z 2025-05-07T19:46:15.1520160Z 2025-05-07T19:46:15.1520163Z 2025-05-07T19:46:15.1520167Z 2025-05-07T19:46:15.1520170Z 2025-05-07T19:46:15.1520174Z 2025-05-07T19:46:15.1520177Z 2025-05-07T19:46:15.1520180Z 2025-05-07T19:46:15.1520184Z 2025-05-07T19:46:15.1520187Z 2025-05-07T19:46:15.1520190Z 2025-05-07T19:46:15.1520194Z 2025-05-07T19:46:15.1520198Z 2025-05-07T19:46:15.1705176Z cuda-nvcc-dev_linux- | 12.7 MB | | 0%  2025-05-07T19:46:15.1705545Z 2025-05-07T19:46:15.1705563Z 2025-05-07T19:46:15.1705568Z 2025-05-07T19:46:15.1705571Z 2025-05-07T19:46:15.1705578Z 2025-05-07T19:46:15.1705582Z 2025-05-07T19:46:15.1705605Z 2025-05-07T19:46:15.1705609Z 2025-05-07T19:46:15.1705612Z 2025-05-07T19:46:15.1705616Z 2025-05-07T19:46:15.1705619Z 2025-05-07T19:46:15.1705623Z 2025-05-07T19:46:15.1705626Z 2025-05-07T19:46:15.1825316Z cuda-nvvm-tools-12.8 | 23.5 MB | #####9 | 59%  2025-05-07T19:46:15.1825677Z 2025-05-07T19:46:15.1825682Z 2025-05-07T19:46:15.1825686Z 2025-05-07T19:46:15.1825690Z 2025-05-07T19:46:15.1825693Z 2025-05-07T19:46:15.1825697Z 2025-05-07T19:46:15.1825700Z 2025-05-07T19:46:15.1825703Z 2025-05-07T19:46:15.1825707Z 2025-05-07T19:46:15.1825710Z 2025-05-07T19:46:15.1825713Z 2025-05-07T19:46:15.1825717Z 2025-05-07T19:46:15.1825721Z 2025-05-07T19:46:15.1825738Z 2025-05-07T19:46:15.2523234Z cuda-nvvm-impl-12.8. | 20.8 MB | ##9 | 30%  2025-05-07T19:46:15.2523612Z 2025-05-07T19:46:15.2523617Z 2025-05-07T19:46:15.2523625Z 2025-05-07T19:46:15.2523628Z 2025-05-07T19:46:15.2523632Z 2025-05-07T19:46:15.2523660Z 2025-05-07T19:46:15.2523678Z 2025-05-07T19:46:15.2523682Z 2025-05-07T19:46:15.2523685Z 2025-05-07T19:46:15.2523689Z 2025-05-07T19:46:15.2523693Z 2025-05-07T19:46:15.2523697Z 2025-05-07T19:46:15.2523701Z 2025-05-07T19:46:15.2523705Z 2025-05-07T19:46:15.2523721Z 2025-05-07T19:46:15.2825683Z cuda-nvcc-dev_linux- | 12.7 MB | #### | 41%  2025-05-07T19:46:15.2826054Z 2025-05-07T19:46:15.2826059Z 2025-05-07T19:46:15.2826063Z 2025-05-07T19:46:15.2826080Z 2025-05-07T19:46:15.2826084Z 2025-05-07T19:46:15.2826087Z 2025-05-07T19:46:15.2826091Z 2025-05-07T19:46:15.2826094Z 2025-05-07T19:46:15.2826098Z 2025-05-07T19:46:15.2826102Z 2025-05-07T19:46:15.2826105Z 2025-05-07T19:46:15.2826109Z 2025-05-07T19:46:15.2826112Z 2025-05-07T19:46:15.2826116Z 2025-05-07T19:46:15.2853193Z cuda-nvvm-impl-12.8. | 20.8 MB | #####4 | 54%  2025-05-07T19:46:15.2853556Z 2025-05-07T19:46:15.2853560Z 2025-05-07T19:46:15.2853790Z 2025-05-07T19:46:15.2853794Z 2025-05-07T19:46:15.2853798Z 2025-05-07T19:46:15.2853801Z 2025-05-07T19:46:15.2853805Z 2025-05-07T19:46:15.2853809Z 2025-05-07T19:46:15.2853812Z 2025-05-07T19:46:15.2853816Z 2025-05-07T19:46:15.2853827Z 2025-05-07T19:46:15.2853830Z 2025-05-07T19:46:15.2853833Z 2025-05-07T19:46:15.3525137Z cuda-nvvm-tools-12.8 | 23.5 MB | ########1 | 81%  2025-05-07T19:46:15.3525512Z 2025-05-07T19:46:15.3525517Z 2025-05-07T19:46:15.3525521Z 2025-05-07T19:46:15.3525524Z 2025-05-07T19:46:15.3525528Z 2025-05-07T19:46:15.3525532Z 2025-05-07T19:46:15.3525536Z 2025-05-07T19:46:15.3525539Z 2025-05-07T19:46:15.3525543Z 2025-05-07T19:46:15.3525547Z 2025-05-07T19:46:15.3525551Z 2025-05-07T19:46:15.3525555Z 2025-05-07T19:46:15.3525562Z 2025-05-07T19:46:15.3525566Z 2025-05-07T19:46:15.3525570Z 2025-05-07T19:46:15.3826476Z cuda-nvcc-dev_linux- | 12.7 MB | ########1 | 82%  2025-05-07T19:46:15.3826861Z 2025-05-07T19:46:15.3826866Z 2025-05-07T19:46:15.3826870Z 2025-05-07T19:46:15.3826873Z 2025-05-07T19:46:15.3826877Z 2025-05-07T19:46:15.3826880Z 2025-05-07T19:46:15.3826884Z 2025-05-07T19:46:15.3826888Z 2025-05-07T19:46:15.3827111Z 2025-05-07T19:46:15.3827116Z 2025-05-07T19:46:15.3827120Z 2025-05-07T19:46:15.3827123Z 2025-05-07T19:46:15.3827126Z 2025-05-07T19:46:15.3827137Z 2025-05-07T19:46:15.5477010Z cuda-nvvm-impl-12.8. | 20.8 MB | #######9 | 80%  2025-05-07T19:46:15.5477449Z 2025-05-07T19:46:15.5477454Z 2025-05-07T19:46:15.5477458Z 2025-05-07T19:46:15.5477462Z 2025-05-07T19:46:15.5477466Z 2025-05-07T19:46:15.5477469Z 2025-05-07T19:46:15.5477473Z 2025-05-07T19:46:15.5477476Z 2025-05-07T19:46:15.5477479Z 2025-05-07T19:46:15.5477483Z 2025-05-07T19:46:15.5477486Z 2025-05-07T19:46:15.5477489Z 2025-05-07T19:46:15.5477493Z 2025-05-07T19:46:15.5477496Z 2025-05-07T19:46:15.5477500Z 2025-05-07T19:46:15.5966244Z cuda-nvcc-dev_linux- | 12.7 MB | ########## | 100%  2025-05-07T19:46:15.5966667Z 2025-05-07T19:46:15.5966672Z 2025-05-07T19:46:15.5966676Z 2025-05-07T19:46:15.5966679Z 2025-05-07T19:46:15.5966683Z 2025-05-07T19:46:15.5966700Z 2025-05-07T19:46:15.5966704Z 2025-05-07T19:46:15.5966708Z 2025-05-07T19:46:15.5966711Z 2025-05-07T19:46:15.5966715Z 2025-05-07T19:46:15.5966718Z 2025-05-07T19:46:15.5966721Z 2025-05-07T19:46:15.5966725Z 2025-05-07T19:46:15.5966728Z 2025-05-07T19:46:15.5966732Z 2025-05-07T19:46:15.5966735Z 2025-05-07T19:46:15.6865520Z cuda-sanitizer-api-1 | 8.8 MB | | 0%  2025-05-07T19:46:15.6865912Z 2025-05-07T19:46:15.6865918Z 2025-05-07T19:46:15.6865923Z 2025-05-07T19:46:15.6865928Z 2025-05-07T19:46:15.6865933Z 2025-05-07T19:46:15.6865939Z 2025-05-07T19:46:15.6865944Z 2025-05-07T19:46:15.6865960Z 2025-05-07T19:46:15.6865965Z 2025-05-07T19:46:15.6865969Z 2025-05-07T19:46:15.6865998Z 2025-05-07T19:46:15.6866002Z 2025-05-07T19:46:15.6866006Z 2025-05-07T19:46:15.6874475Z cuda-nvvm-tools-12.8 | 23.5 MB | ########## | 100%  2025-05-07T19:46:15.6874956Z 2025-05-07T19:46:15.6874973Z 2025-05-07T19:46:15.6874991Z 2025-05-07T19:46:15.6874995Z 2025-05-07T19:46:15.6874999Z 2025-05-07T19:46:15.6875002Z 2025-05-07T19:46:15.6875006Z 2025-05-07T19:46:15.6875009Z 2025-05-07T19:46:15.6875012Z 2025-05-07T19:46:15.6875016Z 2025-05-07T19:46:15.6875026Z 2025-05-07T19:46:15.6875029Z 2025-05-07T19:46:15.6875033Z 2025-05-07T19:46:15.6876176Z 2025-05-07T19:46:15.7212044Z cuda-nvvm-impl-12.8. | 20.8 MB | ########## | 100%  2025-05-07T19:46:15.7212443Z 2025-05-07T19:46:15.7212449Z 2025-05-07T19:46:15.7212454Z 2025-05-07T19:46:15.7212458Z 2025-05-07T19:46:15.7212462Z 2025-05-07T19:46:15.7212466Z 2025-05-07T19:46:15.7212469Z 2025-05-07T19:46:15.7212474Z 2025-05-07T19:46:15.7212478Z 2025-05-07T19:46:15.7212744Z 2025-05-07T19:46:15.7212748Z 2025-05-07T19:46:15.7212752Z 2025-05-07T19:46:15.7212756Z 2025-05-07T19:46:15.7212760Z 2025-05-07T19:46:15.7212763Z 2025-05-07T19:46:15.7212766Z 2025-05-07T19:46:15.7212770Z 2025-05-07T19:46:15.7287296Z cuda-nvdisasm-12.8.5 | 4.9 MB | | 0%  2025-05-07T19:46:15.7287711Z 2025-05-07T19:46:15.7287716Z 2025-05-07T19:46:15.7287720Z 2025-05-07T19:46:15.7287724Z 2025-05-07T19:46:15.7287727Z 2025-05-07T19:46:15.7287731Z 2025-05-07T19:46:15.7287747Z 2025-05-07T19:46:15.7287751Z 2025-05-07T19:46:15.7287754Z 2025-05-07T19:46:15.7287757Z 2025-05-07T19:46:15.7287761Z 2025-05-07T19:46:15.7287764Z 2025-05-07T19:46:15.7287767Z 2025-05-07T19:46:15.7287771Z 2025-05-07T19:46:15.7287774Z 2025-05-07T19:46:15.7287777Z 2025-05-07T19:46:15.7287781Z 2025-05-07T19:46:15.7287784Z 2025-05-07T19:46:15.8099894Z cuda-cupti-dev-12.8. | 4.0 MB | | 0%  2025-05-07T19:46:15.8100327Z 2025-05-07T19:46:15.8100332Z 2025-05-07T19:46:15.8100335Z 2025-05-07T19:46:15.8100339Z 2025-05-07T19:46:15.8100343Z 2025-05-07T19:46:15.8100346Z 2025-05-07T19:46:15.8100350Z 2025-05-07T19:46:15.8100354Z 2025-05-07T19:46:15.8100557Z 2025-05-07T19:46:15.8100561Z 2025-05-07T19:46:15.8100565Z 2025-05-07T19:46:15.8100568Z 2025-05-07T19:46:15.8100572Z 2025-05-07T19:46:15.8100575Z 2025-05-07T19:46:15.8100578Z 2025-05-07T19:46:15.8100582Z 2025-05-07T19:46:15.8100941Z cuda-sanitizer-api-1 | 8.8 MB | ########## | 100%  2025-05-07T19:46:15.8101292Z 2025-05-07T19:46:15.8101295Z 2025-05-07T19:46:15.8101299Z 2025-05-07T19:46:15.8101303Z 2025-05-07T19:46:15.8101306Z 2025-05-07T19:46:15.8101309Z 2025-05-07T19:46:15.8101313Z 2025-05-07T19:46:15.8101316Z 2025-05-07T19:46:15.8101319Z 2025-05-07T19:46:15.8101323Z 2025-05-07T19:46:15.8101338Z 2025-05-07T19:46:15.8101342Z 2025-05-07T19:46:15.8101345Z 2025-05-07T19:46:15.8101357Z 2025-05-07T19:46:15.8101360Z 2025-05-07T19:46:15.8101364Z 2025-05-07T19:46:15.8328633Z cuda-sanitizer-api-1 | 8.8 MB | ########## | 100%  2025-05-07T19:46:15.8329029Z 2025-05-07T19:46:15.8329052Z 2025-05-07T19:46:15.8329056Z 2025-05-07T19:46:15.8329060Z 2025-05-07T19:46:15.8329064Z 2025-05-07T19:46:15.8329067Z 2025-05-07T19:46:15.8329071Z 2025-05-07T19:46:15.8329074Z 2025-05-07T19:46:15.8329078Z 2025-05-07T19:46:15.8329081Z 2025-05-07T19:46:15.8329084Z 2025-05-07T19:46:15.8329088Z 2025-05-07T19:46:15.8329091Z 2025-05-07T19:46:15.8329095Z 2025-05-07T19:46:15.8329099Z 2025-05-07T19:46:15.8329102Z 2025-05-07T19:46:15.8329106Z 2025-05-07T19:46:15.8329110Z 2025-05-07T19:46:15.8329453Z cuda-cupti-dev-12.8. | 4.0 MB | ########## | 100%  2025-05-07T19:46:15.8329793Z 2025-05-07T19:46:15.8331598Z 2025-05-07T19:46:15.8331603Z 2025-05-07T19:46:15.8331606Z 2025-05-07T19:46:15.8331622Z 2025-05-07T19:46:15.8331625Z 2025-05-07T19:46:15.8331646Z 2025-05-07T19:46:15.8331809Z 2025-05-07T19:46:15.8331818Z 2025-05-07T19:46:15.8331823Z 2025-05-07T19:46:15.8331827Z 2025-05-07T19:46:15.8331860Z 2025-05-07T19:46:15.8331864Z 2025-05-07T19:46:15.8331888Z 2025-05-07T19:46:15.8331892Z 2025-05-07T19:46:15.8331895Z 2025-05-07T19:46:15.8331899Z 2025-05-07T19:46:15.8331903Z 2025-05-07T19:46:15.8571776Z cuda-cupti-dev-12.8. | 4.0 MB | ########## | 100%  2025-05-07T19:46:15.8572219Z 2025-05-07T19:46:15.8572247Z 2025-05-07T19:46:15.8572252Z 2025-05-07T19:46:15.8572287Z 2025-05-07T19:46:15.8572292Z 2025-05-07T19:46:15.8572298Z 2025-05-07T19:46:15.8572303Z 2025-05-07T19:46:15.8572309Z 2025-05-07T19:46:15.8572315Z 2025-05-07T19:46:15.8572321Z 2025-05-07T19:46:15.8572327Z 2025-05-07T19:46:15.8572332Z 2025-05-07T19:46:15.8572338Z 2025-05-07T19:46:15.8572344Z 2025-05-07T19:46:15.8572349Z 2025-05-07T19:46:15.8572353Z 2025-05-07T19:46:15.8572616Z 2025-05-07T19:46:15.8572621Z 2025-05-07T19:46:15.8572625Z 2025-05-07T19:46:15.8658187Z ... (more hidden) ... 2025-05-07T19:46:15.8658560Z 2025-05-07T19:46:15.8658588Z 2025-05-07T19:46:15.8658592Z 2025-05-07T19:46:15.8658596Z 2025-05-07T19:46:15.8658599Z 2025-05-07T19:46:15.8658603Z 2025-05-07T19:46:15.8658607Z 2025-05-07T19:46:15.8658610Z 2025-05-07T19:46:15.8658614Z 2025-05-07T19:46:15.8658617Z 2025-05-07T19:46:15.8658620Z 2025-05-07T19:46:15.8658624Z 2025-05-07T19:46:15.8658627Z 2025-05-07T19:46:15.8658652Z 2025-05-07T19:46:15.8658657Z 2025-05-07T19:46:15.8658662Z 2025-05-07T19:46:15.8658665Z 2025-05-07T19:46:15.8659016Z cuda-nvdisasm-12.8.5 | 4.9 MB | ########## | 100%  2025-05-07T19:46:15.8659364Z 2025-05-07T19:46:15.8659369Z 2025-05-07T19:46:15.8659372Z 2025-05-07T19:46:15.8659376Z 2025-05-07T19:46:15.8659379Z 2025-05-07T19:46:15.8659383Z 2025-05-07T19:46:15.8659425Z 2025-05-07T19:46:15.8659428Z 2025-05-07T19:46:15.8659431Z 2025-05-07T19:46:15.8659435Z 2025-05-07T19:46:15.8659438Z 2025-05-07T19:46:15.8659441Z 2025-05-07T19:46:15.8659445Z 2025-05-07T19:46:15.8659448Z 2025-05-07T19:46:15.8659658Z 2025-05-07T19:46:15.8659663Z 2025-05-07T19:46:15.8659666Z 2025-05-07T19:46:15.9338667Z cuda-nvdisasm-12.8.5 | 4.9 MB | ########## | 100%  2025-05-07T19:46:15.9339091Z 2025-05-07T19:46:15.9339095Z 2025-05-07T19:46:15.9339100Z 2025-05-07T19:46:15.9339104Z 2025-05-07T19:46:15.9339107Z 2025-05-07T19:46:15.9339111Z 2025-05-07T19:46:15.9339116Z 2025-05-07T19:46:15.9339120Z 2025-05-07T19:46:15.9339126Z 2025-05-07T19:46:15.9339130Z 2025-05-07T19:46:15.9339133Z 2025-05-07T19:46:15.9339137Z 2025-05-07T19:46:15.9339141Z 2025-05-07T19:46:15.9339145Z 2025-05-07T19:46:15.9339149Z 2025-05-07T19:46:15.9339152Z 2025-05-07T19:46:15.9339155Z 2025-05-07T19:46:15.9339159Z 2025-05-07T19:46:15.9339162Z 2025-05-07T19:46:16.0359491Z ... (more hidden) ... 2025-05-07T19:46:16.0359846Z 2025-05-07T19:46:16.3145353Z nsight-compute-2025. | 320.6 MB | ########## | 100%  2025-05-07T19:46:16.3145709Z 2025-05-07T19:46:16.3145743Z 2025-05-07T19:46:16.3145748Z 2025-05-07T19:46:16.3145751Z 2025-05-07T19:46:16.3145778Z 2025-05-07T19:46:16.3145783Z 2025-05-07T19:46:16.3145786Z 2025-05-07T19:46:16.7547073Z cuda-nvvp-12.8.57 | 112.4 MB | ########## | 100%  2025-05-07T19:46:16.7547447Z 2025-05-07T19:46:16.7547453Z 2025-05-07T19:46:16.7547456Z 2025-05-07T19:46:16.7547461Z 2025-05-07T19:46:16.7547464Z 2025-05-07T19:46:16.8827366Z libnpp-12.3.3.65 | 130.6 MB | ########## | 100%  2025-05-07T19:46:16.8827698Z 2025-05-07T19:46:16.8827704Z 2025-05-07T19:46:16.8827710Z 2025-05-07T19:46:16.8827716Z 2025-05-07T19:46:16.8827722Z 2025-05-07T19:46:16.8827746Z 2025-05-07T19:46:16.8827752Z 2025-05-07T19:46:16.8827789Z 2025-05-07T19:46:16.9178439Z cuda-nvrtc-12.8.61 | 63.1 MB | ########## | 100%  2025-05-07T19:46:16.9178779Z 2025-05-07T19:46:16.9178785Z 2025-05-07T19:46:16.9178790Z 2025-05-07T19:46:16.9178794Z 2025-05-07T19:46:16.9178829Z 2025-05-07T19:46:16.9178848Z 2025-05-07T19:46:16.9178852Z 2025-05-07T19:46:16.9178856Z 2025-05-07T19:46:16.9178859Z 2025-05-07T19:46:17.0569114Z libcurand-10.3.9.55 | 43.6 MB | ########## | 100%  2025-05-07T19:46:17.0569774Z 2025-05-07T19:46:17.0569797Z 2025-05-07T19:46:17.0569802Z 2025-05-07T19:46:17.0569806Z 2025-05-07T19:46:17.0569810Z 2025-05-07T19:46:17.0569814Z 2025-05-07T19:46:17.0569818Z 2025-05-07T19:46:17.0569835Z 2025-05-07T19:46:17.0569840Z 2025-05-07T19:46:17.0569845Z 2025-05-07T19:46:17.3532510Z gds-tools-1.13.0.11 | 37.9 MB | ########## | 100%  2025-05-07T19:46:17.3532873Z 2025-05-07T19:46:17.3532898Z 2025-05-07T19:46:17.3532902Z 2025-05-07T19:46:17.3532906Z 2025-05-07T19:46:17.3533184Z 2025-05-07T19:46:17.3533189Z 2025-05-07T19:46:17.3533192Z 2025-05-07T19:46:17.3533212Z 2025-05-07T19:46:17.3533220Z 2025-05-07T19:46:17.3533224Z 2025-05-07T19:46:17.3533228Z 2025-05-07T19:46:17.3533232Z 2025-05-07T19:46:17.3533252Z 2025-05-07T19:46:17.3533255Z 2025-05-07T19:46:17.3533259Z 2025-05-07T19:46:17.3539014Z cuda-nvcc-dev_linux- | 12.7 MB | ########## | 100%  2025-05-07T19:46:17.3539371Z 2025-05-07T19:46:17.3539374Z 2025-05-07T19:46:17.3539378Z 2025-05-07T19:46:17.3539381Z 2025-05-07T19:46:17.3539385Z 2025-05-07T19:46:17.3539388Z 2025-05-07T19:46:17.3539392Z 2025-05-07T19:46:17.3539395Z 2025-05-07T19:46:17.3539398Z 2025-05-07T19:46:17.3539402Z 2025-05-07T19:46:17.3539575Z 2025-05-07T19:46:17.3774830Z libnvjitlink-12.8.61 | 28.7 MB | ########## | 100%  2025-05-07T19:46:17.3775280Z 2025-05-07T19:46:17.3775284Z 2025-05-07T19:46:17.3775288Z 2025-05-07T19:46:17.3775293Z 2025-05-07T19:46:17.3775314Z 2025-05-07T19:46:17.3775318Z 2025-05-07T19:46:17.3775338Z 2025-05-07T19:46:17.3775342Z 2025-05-07T19:46:17.3775345Z 2025-05-07T19:46:17.3775349Z 2025-05-07T19:46:17.3775352Z 2025-05-07T19:46:17.3775356Z 2025-05-07T19:46:17.5659414Z cuda-nvcc-tools-12.8 | 24.5 MB | ########## | 100%  2025-05-07T19:46:17.5659868Z 2025-05-07T19:46:17.5659874Z 2025-05-07T19:46:17.5659878Z 2025-05-07T19:46:17.5659882Z 2025-05-07T19:46:17.5659886Z 2025-05-07T19:46:17.5659889Z 2025-05-07T19:46:17.5659892Z 2025-05-07T19:46:17.5659896Z 2025-05-07T19:46:17.5659899Z 2025-05-07T19:46:17.5659903Z 2025-05-07T19:46:17.5659907Z 2025-05-07T19:46:17.5659911Z 2025-05-07T19:46:17.5659915Z 2025-05-07T19:46:17.5659919Z 2025-05-07T19:46:17.5659923Z 2025-05-07T19:46:17.5659927Z 2025-05-07T19:46:17.6995281Z cuda-sanitizer-api-1 | 8.8 MB | ########## | 100%  2025-05-07T19:46:17.6995736Z 2025-05-07T19:46:17.6995777Z 2025-05-07T19:46:17.6995781Z 2025-05-07T19:46:17.6995784Z 2025-05-07T19:46:17.6995788Z 2025-05-07T19:46:17.6995792Z 2025-05-07T19:46:17.6995796Z 2025-05-07T19:46:17.6995800Z 2025-05-07T19:46:17.6995805Z 2025-05-07T19:46:17.6995809Z 2025-05-07T19:46:17.6995827Z 2025-05-07T19:46:17.6995831Z 2025-05-07T19:46:17.6995835Z 2025-05-07T19:46:17.6995838Z 2025-05-07T19:46:17.7177333Z cuda-nvvm-impl-12.8. | 20.8 MB | ########## | 100%  2025-05-07T19:46:17.7177684Z 2025-05-07T19:46:17.7177689Z 2025-05-07T19:46:17.7177693Z 2025-05-07T19:46:17.7177697Z 2025-05-07T19:46:17.7177701Z 2025-05-07T19:46:17.7177705Z 2025-05-07T19:46:17.7177709Z 2025-05-07T19:46:17.7177713Z 2025-05-07T19:46:17.7177717Z 2025-05-07T19:46:17.7177735Z 2025-05-07T19:46:17.7177755Z 2025-05-07T19:46:17.7177760Z 2025-05-07T19:46:17.7177764Z 2025-05-07T19:46:17.7177769Z 2025-05-07T19:46:17.7177772Z 2025-05-07T19:46:17.7177775Z 2025-05-07T19:46:17.7177779Z 2025-05-07T19:46:17.7177804Z 2025-05-07T19:46:17.7377819Z cuda-cupti-dev-12.8. | 4.0 MB | ########## | 100%  2025-05-07T19:46:17.7378233Z 2025-05-07T19:46:17.7378238Z 2025-05-07T19:46:17.7378243Z 2025-05-07T19:46:17.7378265Z 2025-05-07T19:46:17.7378270Z 2025-05-07T19:46:17.7378273Z 2025-05-07T19:46:17.7378278Z 2025-05-07T19:46:17.7378281Z 2025-05-07T19:46:17.7378285Z 2025-05-07T19:46:17.7378288Z 2025-05-07T19:46:17.7378291Z 2025-05-07T19:46:17.7378295Z 2025-05-07T19:46:17.7378299Z 2025-05-07T19:46:17.7378302Z 2025-05-07T19:46:17.7378305Z 2025-05-07T19:46:17.7378309Z 2025-05-07T19:46:17.7378313Z 2025-05-07T19:46:17.7678869Z cuda-nvdisasm-12.8.5 | 4.9 MB | ########## | 100%  2025-05-07T19:46:17.7679263Z 2025-05-07T19:46:17.7679268Z 2025-05-07T19:46:17.7679272Z 2025-05-07T19:46:17.7679276Z 2025-05-07T19:46:17.7679280Z 2025-05-07T19:46:17.7679284Z 2025-05-07T19:46:17.7679288Z 2025-05-07T19:46:17.7679521Z 2025-05-07T19:46:17.7679525Z 2025-05-07T19:46:17.7679529Z 2025-05-07T19:46:17.7679533Z 2025-05-07T19:46:17.7679537Z 2025-05-07T19:46:17.7679541Z 2025-05-07T19:46:17.7721352Z cuda-nvvm-tools-12.8 | 23.5 MB | ########## | 100%  2025-05-07T19:46:17.7721729Z 2025-05-07T19:46:17.7721734Z 2025-05-07T19:46:17.7721738Z 2025-05-07T19:46:17.7721742Z 2025-05-07T19:46:17.7721746Z 2025-05-07T19:46:17.7721749Z 2025-05-07T19:46:17.7721752Z 2025-05-07T19:46:17.7721756Z 2025-05-07T19:46:17.7721775Z 2025-05-07T19:46:17.7721779Z 2025-05-07T19:46:17.7721782Z 2025-05-07T19:46:17.7721785Z 2025-05-07T19:46:17.7721789Z 2025-05-07T19:46:17.7721792Z 2025-05-07T19:46:17.7721796Z 2025-05-07T19:46:17.7721799Z 2025-05-07T19:46:17.7721802Z 2025-05-07T19:46:17.7721806Z 2025-05-07T19:46:17.7721809Z 2025-05-07T19:46:17.7722070Z ... (more hidden) ... 2025-05-07T19:46:17.7722375Z 2025-05-07T19:46:17.7722379Z 2025-05-07T19:46:17.7722391Z 2025-05-07T19:46:17.7722394Z 2025-05-07T19:46:17.7722397Z 2025-05-07T19:46:17.7722401Z 2025-05-07T19:46:17.7722404Z 2025-05-07T19:46:17.7722408Z 2025-05-07T19:46:17.7722411Z 2025-05-07T19:46:17.7722414Z 2025-05-07T19:46:17.7722611Z 2025-05-07T19:46:17.7722615Z 2025-05-07T19:46:17.7722619Z 2025-05-07T19:46:17.7722623Z 2025-05-07T19:46:17.7722626Z 2025-05-07T19:46:17.7722629Z 2025-05-07T19:46:17.7722633Z 2025-05-07T19:46:17.7722636Z 2025-05-07T19:46:17.7722639Z 2025-05-07T19:46:17.9142548Z ... (more hidden) ... 2025-05-07T19:46:21.5725954Z libcublas-12.8.3.14 | 460.2 MB | ########## | 100% 2025-05-07T19:46:22.1505855Z libcublas-12.8.3.14 | 460.2 MB | ########## | 100% 2025-05-07T19:46:22.1507100Z 2025-05-07T19:46:22.1522731Z nsight-compute-2025. | 320.6 MB | ########## | 100%  2025-05-07T19:46:22.1523110Z 2025-05-07T19:46:22.1523117Z 2025-05-07T19:46:22.1523123Z 2025-05-07T19:46:22.1523129Z 2025-05-07T19:46:22.1523171Z 2025-05-07T19:46:22.1523176Z 2025-05-07T19:46:22.1523181Z 2025-05-07T19:46:22.1523185Z 2025-05-07T19:46:22.1523190Z 2025-05-07T19:46:22.1523196Z 2025-05-07T19:46:22.1523199Z 2025-05-07T19:46:22.1523203Z 2025-05-07T19:46:22.1523224Z 2025-05-07T19:46:22.1523227Z 2025-05-07T19:46:22.1523231Z 2025-05-07T19:46:22.1523235Z 2025-05-07T19:46:22.1523238Z 2025-05-07T19:46:22.1523241Z 2025-05-07T19:46:22.1523245Z 2025-05-07T19:46:22.1523367Z 2025-05-07T19:46:22.1523714Z  2025-05-07T19:46:22.1524077Z 2025-05-07T19:46:22.1524297Z 2025-05-07T19:46:22.1524469Z  2025-05-07T19:46:22.1524711Z 2025-05-07T19:46:22.1524715Z 2025-05-07T19:46:22.1524889Z  2025-05-07T19:46:22.1525109Z 2025-05-07T19:46:22.1525113Z 2025-05-07T19:46:22.1525117Z 2025-05-07T19:46:22.1525326Z  2025-05-07T19:46:22.1525549Z 2025-05-07T19:46:22.1525553Z 2025-05-07T19:46:22.1525556Z 2025-05-07T19:46:22.1525559Z 2025-05-07T19:46:22.1525769Z  2025-05-07T19:46:22.1526000Z 2025-05-07T19:46:22.1526003Z 2025-05-07T19:46:22.1526007Z 2025-05-07T19:46:22.1526011Z 2025-05-07T19:46:22.1526014Z 2025-05-07T19:46:22.1526199Z  2025-05-07T19:46:22.1526448Z 2025-05-07T19:46:22.1526451Z 2025-05-07T19:46:22.1526455Z 2025-05-07T19:46:22.1526459Z 2025-05-07T19:46:22.1526462Z 2025-05-07T19:46:22.1526465Z 2025-05-07T19:46:22.1526657Z  2025-05-07T19:46:22.1526888Z 2025-05-07T19:46:22.1526892Z 2025-05-07T19:46:22.1526916Z 2025-05-07T19:46:22.1526919Z 2025-05-07T19:46:22.1526922Z 2025-05-07T19:46:22.1526926Z 2025-05-07T19:46:22.1527240Z 2025-05-07T19:46:22.1527448Z  2025-05-07T19:46:22.1527682Z 2025-05-07T19:46:22.1527687Z 2025-05-07T19:46:22.1527690Z 2025-05-07T19:46:22.1527695Z 2025-05-07T19:46:22.1527713Z 2025-05-07T19:46:22.1527739Z 2025-05-07T19:46:22.1527742Z 2025-05-07T19:46:22.1527745Z 2025-05-07T19:46:22.1527941Z  2025-05-07T19:46:22.1528176Z 2025-05-07T19:46:22.1528180Z 2025-05-07T19:46:22.1528185Z 2025-05-07T19:46:22.1528188Z 2025-05-07T19:46:22.1528192Z 2025-05-07T19:46:22.1528196Z 2025-05-07T19:46:22.1528200Z 2025-05-07T19:46:22.1528223Z 2025-05-07T19:46:22.1528226Z 2025-05-07T19:46:22.1528424Z  2025-05-07T19:46:22.1528662Z 2025-05-07T19:46:22.1528870Z 2025-05-07T19:46:22.1528874Z 2025-05-07T19:46:22.1528878Z 2025-05-07T19:46:22.1528882Z 2025-05-07T19:46:22.1528885Z 2025-05-07T19:46:22.1528896Z 2025-05-07T19:46:22.1528899Z 2025-05-07T19:46:22.1528902Z 2025-05-07T19:46:22.1528906Z 2025-05-07T19:46:22.1529103Z  2025-05-07T19:46:22.1530741Z 2025-05-07T19:46:22.1530748Z 2025-05-07T19:46:22.1530752Z 2025-05-07T19:46:22.1530755Z 2025-05-07T19:46:22.1530759Z 2025-05-07T19:46:22.1530762Z 2025-05-07T19:46:22.1530766Z 2025-05-07T19:46:22.1530769Z 2025-05-07T19:46:22.1530772Z 2025-05-07T19:46:22.1530775Z 2025-05-07T19:46:22.1530779Z 2025-05-07T19:46:22.1531014Z  2025-05-07T19:46:22.1531284Z 2025-05-07T19:46:22.1531288Z 2025-05-07T19:46:22.1531291Z 2025-05-07T19:46:22.1531295Z 2025-05-07T19:46:22.1531299Z 2025-05-07T19:46:22.1531303Z 2025-05-07T19:46:22.1531306Z 2025-05-07T19:46:22.1531310Z 2025-05-07T19:46:22.1531313Z 2025-05-07T19:46:22.1531317Z 2025-05-07T19:46:22.1531320Z 2025-05-07T19:46:22.1531330Z 2025-05-07T19:46:22.1531561Z  2025-05-07T19:46:22.1531809Z 2025-05-07T19:46:22.1531813Z 2025-05-07T19:46:22.1531817Z 2025-05-07T19:46:22.1531825Z 2025-05-07T19:46:22.1531828Z 2025-05-07T19:46:22.1531832Z 2025-05-07T19:46:22.1531835Z 2025-05-07T19:46:22.1531839Z 2025-05-07T19:46:22.1531842Z 2025-05-07T19:46:22.1531846Z 2025-05-07T19:46:22.1531850Z 2025-05-07T19:46:22.1531854Z 2025-05-07T19:46:22.1531858Z 2025-05-07T19:46:22.1532097Z  2025-05-07T19:46:22.1532343Z 2025-05-07T19:46:22.1532346Z 2025-05-07T19:46:22.1532350Z 2025-05-07T19:46:22.1532353Z 2025-05-07T19:46:22.1532357Z 2025-05-07T19:46:22.1532360Z 2025-05-07T19:46:22.1532364Z 2025-05-07T19:46:22.1532367Z 2025-05-07T19:46:22.1532371Z 2025-05-07T19:46:22.1532374Z 2025-05-07T19:46:22.1532378Z 2025-05-07T19:46:22.1532381Z 2025-05-07T19:46:22.1532406Z 2025-05-07T19:46:22.1532410Z 2025-05-07T19:46:22.1532625Z  2025-05-07T19:46:22.1532870Z 2025-05-07T19:46:22.1532874Z 2025-05-07T19:46:22.1532881Z 2025-05-07T19:46:22.1532885Z 2025-05-07T19:46:22.1532888Z 2025-05-07T19:46:22.1532891Z 2025-05-07T19:46:22.1532895Z 2025-05-07T19:46:22.1532898Z 2025-05-07T19:46:22.1532919Z 2025-05-07T19:46:22.1532922Z 2025-05-07T19:46:22.1532926Z 2025-05-07T19:46:22.1532929Z 2025-05-07T19:46:22.1532932Z 2025-05-07T19:46:22.1532936Z 2025-05-07T19:46:22.1532939Z 2025-05-07T19:46:22.1533161Z  2025-05-07T19:46:22.1533408Z 2025-05-07T19:46:22.1533412Z 2025-05-07T19:46:22.1533434Z 2025-05-07T19:46:22.1533437Z 2025-05-07T19:46:22.1533441Z 2025-05-07T19:46:22.1533444Z 2025-05-07T19:46:22.1533447Z 2025-05-07T19:46:22.1533451Z 2025-05-07T19:46:22.1533454Z 2025-05-07T19:46:22.1533551Z 2025-05-07T19:46:22.1533555Z 2025-05-07T19:46:22.1533558Z 2025-05-07T19:46:22.1533562Z 2025-05-07T19:46:22.1533566Z 2025-05-07T19:46:22.1533569Z 2025-05-07T19:46:22.1533572Z 2025-05-07T19:46:22.1533801Z  2025-05-07T19:46:22.1534070Z 2025-05-07T19:46:22.1534074Z 2025-05-07T19:46:22.1534078Z 2025-05-07T19:46:22.1534081Z 2025-05-07T19:46:22.1534085Z 2025-05-07T19:46:22.1534088Z 2025-05-07T19:46:22.1534091Z 2025-05-07T19:46:22.1534095Z 2025-05-07T19:46:22.1534098Z 2025-05-07T19:46:22.1534101Z 2025-05-07T19:46:22.1534105Z 2025-05-07T19:46:22.1534108Z 2025-05-07T19:46:22.1534111Z 2025-05-07T19:46:22.1534114Z 2025-05-07T19:46:22.1534118Z 2025-05-07T19:46:22.1534122Z 2025-05-07T19:46:22.1534125Z 2025-05-07T19:46:22.1534376Z  2025-05-07T19:46:22.1534629Z 2025-05-07T19:46:22.1534638Z 2025-05-07T19:46:22.1534642Z 2025-05-07T19:46:22.1534645Z 2025-05-07T19:46:22.1534648Z 2025-05-07T19:46:22.1534652Z 2025-05-07T19:46:22.1534656Z 2025-05-07T19:46:22.1534659Z 2025-05-07T19:46:22.1534662Z 2025-05-07T19:46:22.1534731Z 2025-05-07T19:46:22.1534735Z 2025-05-07T19:46:22.1534739Z 2025-05-07T19:46:22.1534759Z 2025-05-07T19:46:22.1534763Z 2025-05-07T19:46:22.1534766Z 2025-05-07T19:46:22.1534769Z 2025-05-07T19:46:22.1534773Z 2025-05-07T19:46:22.1534776Z 2025-05-07T19:46:22.1535145Z  2025-05-07T19:46:22.1535423Z 2025-05-07T19:46:22.1535427Z 2025-05-07T19:46:22.1535554Z  2025-05-07T19:46:22.1535665Z 2025-05-07T19:46:22.1535669Z 2025-05-07T19:46:22.1535780Z  2025-05-07T19:46:22.1535913Z 2025-05-07T19:46:22.1535917Z 2025-05-07T19:46:22.1535920Z 2025-05-07T19:46:22.1536027Z  2025-05-07T19:46:22.1536144Z 2025-05-07T19:46:22.1536148Z 2025-05-07T19:46:22.1536156Z 2025-05-07T19:46:22.1536160Z 2025-05-07T19:46:22.1536308Z  2025-05-07T19:46:22.1536431Z 2025-05-07T19:46:22.1536435Z 2025-05-07T19:46:22.1536439Z 2025-05-07T19:46:22.1536442Z 2025-05-07T19:46:22.1536446Z 2025-05-07T19:46:22.1536562Z  2025-05-07T19:46:22.1536714Z 2025-05-07T19:46:22.1536718Z 2025-05-07T19:46:22.1536722Z 2025-05-07T19:46:22.1536726Z 2025-05-07T19:46:22.1536729Z 2025-05-07T19:46:22.1536733Z 2025-05-07T19:46:22.1536847Z  2025-05-07T19:46:22.1536990Z 2025-05-07T19:46:22.1537013Z 2025-05-07T19:46:22.1537017Z 2025-05-07T19:46:22.1537020Z 2025-05-07T19:46:22.1537024Z 2025-05-07T19:46:22.1537027Z 2025-05-07T19:46:22.1537031Z 2025-05-07T19:46:22.1537149Z  2025-05-07T19:46:22.1537297Z 2025-05-07T19:46:22.1537300Z 2025-05-07T19:46:22.1537304Z 2025-05-07T19:46:22.1537307Z 2025-05-07T19:46:22.1537311Z 2025-05-07T19:46:22.1537334Z 2025-05-07T19:46:22.1537337Z 2025-05-07T19:46:22.1537341Z 2025-05-07T19:46:22.1537470Z  2025-05-07T19:46:22.1537625Z 2025-05-07T19:46:22.1537629Z 2025-05-07T19:46:22.1537632Z 2025-05-07T19:46:22.1537636Z 2025-05-07T19:46:22.1537639Z 2025-05-07T19:46:22.1537643Z 2025-05-07T19:46:22.1537646Z 2025-05-07T19:46:22.1537653Z 2025-05-07T19:46:22.1537676Z 2025-05-07T19:46:22.1537801Z  2025-05-07T19:46:22.1537966Z 2025-05-07T19:46:22.1537970Z 2025-05-07T19:46:22.1537974Z 2025-05-07T19:46:22.1537978Z 2025-05-07T19:46:22.1537982Z 2025-05-07T19:46:22.1537985Z 2025-05-07T19:46:22.1537989Z 2025-05-07T19:46:22.1537993Z 2025-05-07T19:46:22.1537996Z 2025-05-07T19:46:22.1538000Z 2025-05-07T19:46:22.1538152Z  2025-05-07T19:46:22.1538329Z 2025-05-07T19:46:22.1538332Z 2025-05-07T19:46:22.1538336Z 2025-05-07T19:46:22.1538339Z 2025-05-07T19:46:22.1538343Z 2025-05-07T19:46:22.1538346Z 2025-05-07T19:46:22.1538350Z 2025-05-07T19:46:22.1538353Z 2025-05-07T19:46:22.1538356Z 2025-05-07T19:46:22.1538360Z 2025-05-07T19:46:22.1538432Z 2025-05-07T19:46:22.1538689Z  2025-05-07T19:46:22.1538875Z 2025-05-07T19:46:22.1538879Z 2025-05-07T19:46:22.1538882Z 2025-05-07T19:46:22.1538886Z 2025-05-07T19:46:22.1538890Z 2025-05-07T19:46:22.1538898Z 2025-05-07T19:46:22.1538901Z 2025-05-07T19:46:22.1538905Z 2025-05-07T19:46:22.1538908Z 2025-05-07T19:46:22.1538931Z 2025-05-07T19:46:22.1538935Z 2025-05-07T19:46:22.1538938Z 2025-05-07T19:46:22.1539079Z  2025-05-07T19:46:22.1539279Z 2025-05-07T19:46:22.1539283Z 2025-05-07T19:46:22.1539286Z 2025-05-07T19:46:22.1539290Z 2025-05-07T19:46:22.1539293Z 2025-05-07T19:46:22.1539297Z 2025-05-07T19:46:22.1539300Z 2025-05-07T19:46:22.1539304Z 2025-05-07T19:46:22.1539327Z 2025-05-07T19:46:22.1539330Z 2025-05-07T19:46:22.1539334Z 2025-05-07T19:46:22.1539337Z 2025-05-07T19:46:22.1539340Z 2025-05-07T19:46:22.1539491Z  2025-05-07T19:46:22.1539692Z 2025-05-07T19:46:22.1539700Z 2025-05-07T19:46:22.1539704Z 2025-05-07T19:46:22.1539707Z 2025-05-07T19:46:22.1539710Z 2025-05-07T19:46:22.1539714Z 2025-05-07T19:46:22.1539739Z 2025-05-07T19:46:22.1539742Z 2025-05-07T19:46:22.1539746Z 2025-05-07T19:46:22.1539829Z 2025-05-07T19:46:22.1539833Z 2025-05-07T19:46:22.1539836Z 2025-05-07T19:46:22.1539840Z 2025-05-07T19:46:22.1539843Z 2025-05-07T19:46:22.1539990Z  2025-05-07T19:46:22.1540195Z 2025-05-07T19:46:22.1540199Z 2025-05-07T19:46:22.1540202Z 2025-05-07T19:46:22.1540223Z 2025-05-07T19:46:22.1540227Z 2025-05-07T19:46:22.1540230Z 2025-05-07T19:46:22.1540234Z 2025-05-07T19:46:22.1540237Z 2025-05-07T19:46:22.1540240Z 2025-05-07T19:46:22.1540244Z 2025-05-07T19:46:22.1540247Z 2025-05-07T19:46:22.1540250Z 2025-05-07T19:46:22.1540254Z 2025-05-07T19:46:22.1540257Z 2025-05-07T19:46:22.1540260Z 2025-05-07T19:46:22.1540413Z  2025-05-07T19:46:22.1540639Z 2025-05-07T19:46:22.1540647Z 2025-05-07T19:46:22.1540651Z 2025-05-07T19:46:22.1540654Z 2025-05-07T19:46:22.1540657Z 2025-05-07T19:46:22.1540661Z 2025-05-07T19:46:22.1540664Z 2025-05-07T19:46:22.1540667Z 2025-05-07T19:46:22.1540671Z 2025-05-07T19:46:22.1540674Z 2025-05-07T19:46:22.1540680Z 2025-05-07T19:46:22.1540684Z 2025-05-07T19:46:22.1540687Z 2025-05-07T19:46:22.1540691Z 2025-05-07T19:46:22.1540694Z 2025-05-07T19:46:22.1540697Z 2025-05-07T19:46:22.1540878Z  2025-05-07T19:46:22.1541101Z 2025-05-07T19:46:22.1541104Z 2025-05-07T19:46:22.1541108Z 2025-05-07T19:46:22.1541111Z 2025-05-07T19:46:22.1541115Z 2025-05-07T19:46:22.1541118Z 2025-05-07T19:46:22.1541122Z 2025-05-07T19:46:22.1541125Z 2025-05-07T19:46:22.1541129Z 2025-05-07T19:46:22.1541132Z 2025-05-07T19:46:22.1541136Z 2025-05-07T19:46:22.1541139Z 2025-05-07T19:46:22.1541142Z 2025-05-07T19:46:22.1541145Z 2025-05-07T19:46:22.1541149Z 2025-05-07T19:46:22.1541152Z 2025-05-07T19:46:22.1541176Z 2025-05-07T19:46:22.1541344Z  2025-05-07T19:46:22.1541565Z 2025-05-07T19:46:22.1541568Z 2025-05-07T19:46:22.1541572Z 2025-05-07T19:46:22.1541575Z 2025-05-07T19:46:22.1541578Z 2025-05-07T19:46:22.1541585Z 2025-05-07T19:46:22.1541589Z 2025-05-07T19:46:22.1541592Z 2025-05-07T19:46:22.1541595Z 2025-05-07T19:46:22.1541599Z 2025-05-07T19:46:22.1541621Z 2025-05-07T19:46:22.1541625Z 2025-05-07T19:46:22.1541628Z 2025-05-07T19:46:22.1541631Z 2025-05-07T19:46:22.1541634Z 2025-05-07T19:46:22.1541638Z 2025-05-07T19:46:22.1541641Z 2025-05-07T19:46:22.1541644Z 2025-05-07T19:46:22.1541816Z  2025-05-07T19:46:22.1542041Z 2025-05-07T19:46:22.1542044Z 2025-05-07T19:46:22.1542203Z  2025-05-07T19:46:22.1542312Z 2025-05-07T19:46:22.1542316Z 2025-05-07T19:46:22.1542421Z  2025-05-07T19:46:22.1542560Z 2025-05-07T19:46:22.1542563Z 2025-05-07T19:46:22.1542567Z 2025-05-07T19:46:22.1542671Z  2025-05-07T19:46:22.1542851Z 2025-05-07T19:46:22.1542855Z 2025-05-07T19:46:22.1542858Z 2025-05-07T19:46:22.1542862Z 2025-05-07T19:46:22.1542992Z  2025-05-07T19:46:22.1543117Z 2025-05-07T19:46:22.1543121Z 2025-05-07T19:46:22.1543124Z 2025-05-07T19:46:22.1543132Z 2025-05-07T19:46:22.1543136Z 2025-05-07T19:46:22.1543252Z  2025-05-07T19:46:22.1543410Z 2025-05-07T19:46:22.1543413Z 2025-05-07T19:46:22.1543417Z 2025-05-07T19:46:22.1543420Z 2025-05-07T19:46:22.1543423Z 2025-05-07T19:46:22.1543427Z 2025-05-07T19:46:22.1543543Z  2025-05-07T19:46:22.1543705Z 2025-05-07T19:46:22.1543709Z 2025-05-07T19:46:22.1543712Z 2025-05-07T19:46:22.1543716Z 2025-05-07T19:46:22.1543719Z 2025-05-07T19:46:22.1543722Z 2025-05-07T19:46:22.1543725Z 2025-05-07T19:46:22.1543843Z  2025-05-07T19:46:22.1543990Z 2025-05-07T19:46:22.1543994Z 2025-05-07T19:46:22.1543998Z 2025-05-07T19:46:22.1544001Z 2025-05-07T19:46:22.1544022Z 2025-05-07T19:46:22.1544026Z 2025-05-07T19:46:22.1544034Z 2025-05-07T19:46:22.1544038Z 2025-05-07T19:46:22.1544165Z  2025-05-07T19:46:22.1544319Z 2025-05-07T19:46:22.1544323Z 2025-05-07T19:46:22.1544326Z 2025-05-07T19:46:22.1544330Z 2025-05-07T19:46:22.1544396Z 2025-05-07T19:46:22.1544400Z 2025-05-07T19:46:22.1544404Z 2025-05-07T19:46:22.1544427Z 2025-05-07T19:46:22.1544430Z 2025-05-07T19:46:22.1544558Z  2025-05-07T19:46:22.1544722Z 2025-05-07T19:46:22.1544726Z 2025-05-07T19:46:22.1544729Z 2025-05-07T19:46:22.1544733Z 2025-05-07T19:46:22.1544736Z 2025-05-07T19:46:22.1544740Z 2025-05-07T19:46:22.1544743Z 2025-05-07T19:46:22.1544748Z 2025-05-07T19:46:22.1544751Z 2025-05-07T19:46:22.1544772Z 2025-05-07T19:46:22.1544916Z  2025-05-07T19:46:22.1545093Z 2025-05-07T19:46:22.1545096Z 2025-05-07T19:46:22.1545100Z 2025-05-07T19:46:22.1545103Z 2025-05-07T19:46:22.1545107Z 2025-05-07T19:46:22.1545110Z 2025-05-07T19:46:22.1545114Z 2025-05-07T19:46:22.1545121Z 2025-05-07T19:46:22.1545125Z 2025-05-07T19:46:22.1545146Z 2025-05-07T19:46:22.1545150Z 2025-05-07T19:46:22.1545288Z  2025-05-07T19:46:22.1545470Z 2025-05-07T19:46:22.1545474Z 2025-05-07T19:46:22.1545481Z 2025-05-07T19:46:22.1545484Z 2025-05-07T19:46:22.1545488Z 2025-05-07T19:46:22.1545491Z 2025-05-07T19:46:22.1545495Z 2025-05-07T19:46:22.1545499Z 2025-05-07T19:46:22.1545502Z 2025-05-07T19:46:22.1545525Z 2025-05-07T19:46:22.1545528Z 2025-05-07T19:46:22.1545532Z 2025-05-07T19:46:22.1545670Z  2025-05-07T19:46:22.1545863Z 2025-05-07T19:46:22.1545866Z 2025-05-07T19:46:22.1545870Z 2025-05-07T19:46:22.1545873Z 2025-05-07T19:46:22.1545878Z 2025-05-07T19:46:22.1545881Z 2025-05-07T19:46:22.1545885Z 2025-05-07T19:46:22.1545890Z 2025-05-07T19:46:22.1545911Z 2025-05-07T19:46:22.1545914Z 2025-05-07T19:46:22.1545917Z 2025-05-07T19:46:22.1545921Z 2025-05-07T19:46:22.1545924Z 2025-05-07T19:46:22.1546067Z  2025-05-07T19:46:22.1546268Z 2025-05-07T19:46:22.1546272Z 2025-05-07T19:46:22.1546276Z 2025-05-07T19:46:22.1546280Z 2025-05-07T19:46:22.1546284Z 2025-05-07T19:46:22.1546287Z 2025-05-07T19:46:22.1546309Z 2025-05-07T19:46:22.1546316Z 2025-05-07T19:46:22.1546320Z 2025-05-07T19:46:22.1546324Z 2025-05-07T19:46:22.1546327Z 2025-05-07T19:46:22.1546332Z 2025-05-07T19:46:22.1546336Z 2025-05-07T19:46:22.1546339Z 2025-05-07T19:46:22.1546484Z  2025-05-07T19:46:22.1546690Z 2025-05-07T19:46:22.1546695Z 2025-05-07T19:46:22.1546716Z 2025-05-07T19:46:22.1546719Z 2025-05-07T19:46:22.1546723Z 2025-05-07T19:46:22.1546727Z 2025-05-07T19:46:22.1546730Z 2025-05-07T19:46:22.1546734Z 2025-05-07T19:46:22.1546737Z 2025-05-07T19:46:22.1546740Z 2025-05-07T19:46:22.1546744Z 2025-05-07T19:46:22.1546747Z 2025-05-07T19:46:22.1546750Z 2025-05-07T19:46:22.1546754Z 2025-05-07T19:46:22.1546757Z 2025-05-07T19:46:22.1546912Z  2025-05-07T19:46:22.1547205Z 2025-05-07T19:46:22.1547208Z 2025-05-07T19:46:22.1547212Z 2025-05-07T19:46:22.1547215Z 2025-05-07T19:46:22.1547219Z 2025-05-07T19:46:22.1547222Z 2025-05-07T19:46:22.1547226Z 2025-05-07T19:46:22.1547233Z 2025-05-07T19:46:22.1547236Z 2025-05-07T19:46:22.1547240Z 2025-05-07T19:46:22.1547243Z 2025-05-07T19:46:22.1547246Z 2025-05-07T19:46:22.1547250Z 2025-05-07T19:46:22.1547253Z 2025-05-07T19:46:22.1547256Z 2025-05-07T19:46:22.1547260Z 2025-05-07T19:46:22.1547445Z  2025-05-07T19:46:22.1547662Z 2025-05-07T19:46:22.1547666Z 2025-05-07T19:46:22.1547669Z 2025-05-07T19:46:22.1547673Z 2025-05-07T19:46:22.1547676Z 2025-05-07T19:46:22.1547680Z 2025-05-07T19:46:22.1547683Z 2025-05-07T19:46:22.1547686Z 2025-05-07T19:46:22.1547690Z 2025-05-07T19:46:22.1547693Z 2025-05-07T19:46:22.1547697Z 2025-05-07T19:46:22.1547700Z 2025-05-07T19:46:22.1547704Z 2025-05-07T19:46:22.1547707Z 2025-05-07T19:46:22.1547711Z 2025-05-07T19:46:22.1547721Z 2025-05-07T19:46:22.1547744Z 2025-05-07T19:46:22.1547909Z  2025-05-07T19:46:22.1548130Z 2025-05-07T19:46:22.1548134Z 2025-05-07T19:46:22.1548137Z 2025-05-07T19:46:22.1548196Z 2025-05-07T19:46:22.1548200Z 2025-05-07T19:46:22.1548203Z 2025-05-07T19:46:22.1548207Z 2025-05-07T19:46:22.1548210Z 2025-05-07T19:46:22.1548213Z 2025-05-07T19:46:22.1548217Z 2025-05-07T19:46:22.1548238Z 2025-05-07T19:46:22.1548242Z 2025-05-07T19:46:22.1548245Z 2025-05-07T19:46:22.1548248Z 2025-05-07T19:46:22.1548252Z 2025-05-07T19:46:22.1548255Z 2025-05-07T19:46:22.1548258Z 2025-05-07T19:46:22.1548261Z 2025-05-07T19:46:22.1548433Z  2025-05-07T19:46:22.1548661Z 2025-05-07T19:46:22.1548664Z 2025-05-07T19:46:22.1548783Z  2025-05-07T19:46:22.1548890Z 2025-05-07T19:46:22.1548894Z 2025-05-07T19:46:22.1548999Z  2025-05-07T19:46:22.1549132Z 2025-05-07T19:46:22.1549136Z 2025-05-07T19:46:22.1549144Z 2025-05-07T19:46:22.1549249Z  2025-05-07T19:46:22.1549363Z 2025-05-07T19:46:22.1549367Z 2025-05-07T19:46:22.1549370Z 2025-05-07T19:46:22.1549375Z 2025-05-07T19:46:22.1549505Z  2025-05-07T19:46:22.1549628Z 2025-05-07T19:46:22.1549632Z 2025-05-07T19:46:22.1549635Z 2025-05-07T19:46:22.1549638Z 2025-05-07T19:46:22.1549642Z 2025-05-07T19:46:22.1549754Z  2025-05-07T19:46:22.1549905Z 2025-05-07T19:46:22.1549910Z 2025-05-07T19:46:22.1549913Z 2025-05-07T19:46:22.1549916Z 2025-05-07T19:46:22.1549920Z 2025-05-07T19:46:22.1549923Z 2025-05-07T19:46:22.1550037Z  2025-05-07T19:46:22.1550193Z 2025-05-07T19:46:22.1550197Z 2025-05-07T19:46:22.1550200Z 2025-05-07T19:46:22.1550204Z 2025-05-07T19:46:22.1550208Z 2025-05-07T19:46:22.1550212Z 2025-05-07T19:46:22.1550215Z 2025-05-07T19:46:22.1550422Z  2025-05-07T19:46:22.1550570Z 2025-05-07T19:46:22.1550574Z 2025-05-07T19:46:22.1550601Z 2025-05-07T19:46:22.1550604Z 2025-05-07T19:46:22.1550612Z 2025-05-07T19:46:22.1550615Z 2025-05-07T19:46:22.1550619Z 2025-05-07T19:46:22.1550622Z 2025-05-07T19:46:22.1550743Z  2025-05-07T19:46:22.1550900Z 2025-05-07T19:46:22.1550904Z 2025-05-07T19:46:22.1550911Z 2025-05-07T19:46:22.1550914Z 2025-05-07T19:46:22.1550918Z 2025-05-07T19:46:22.1550940Z 2025-05-07T19:46:22.1550944Z 2025-05-07T19:46:22.1550947Z 2025-05-07T19:46:22.1550951Z 2025-05-07T19:46:22.1551075Z  2025-05-07T19:46:22.1551240Z 2025-05-07T19:46:22.1551243Z 2025-05-07T19:46:22.1551247Z 2025-05-07T19:46:22.1551250Z 2025-05-07T19:46:22.1551253Z 2025-05-07T19:46:22.1551257Z 2025-05-07T19:46:22.1551260Z 2025-05-07T19:46:22.1551280Z 2025-05-07T19:46:22.1551284Z 2025-05-07T19:46:22.1551287Z 2025-05-07T19:46:22.1551418Z  2025-05-07T19:46:22.1551589Z 2025-05-07T19:46:22.1551593Z 2025-05-07T19:46:22.1551596Z 2025-05-07T19:46:22.1551600Z 2025-05-07T19:46:22.1551603Z 2025-05-07T19:46:22.1551669Z 2025-05-07T19:46:22.1551673Z 2025-05-07T19:46:22.1551676Z 2025-05-07T19:46:22.1551699Z 2025-05-07T19:46:22.1551702Z 2025-05-07T19:46:22.1551706Z 2025-05-07T19:46:22.1551840Z  2025-05-07T19:46:22.1552027Z 2025-05-07T19:46:22.1552031Z 2025-05-07T19:46:22.1552035Z 2025-05-07T19:46:22.1552038Z 2025-05-07T19:46:22.1552041Z 2025-05-07T19:46:22.1552045Z 2025-05-07T19:46:22.1552048Z 2025-05-07T19:46:22.1552052Z 2025-05-07T19:46:22.1552074Z 2025-05-07T19:46:22.1552078Z 2025-05-07T19:46:22.1552081Z 2025-05-07T19:46:22.1552084Z 2025-05-07T19:46:22.1552221Z  2025-05-07T19:46:22.1552413Z 2025-05-07T19:46:22.1552417Z 2025-05-07T19:46:22.1552420Z 2025-05-07T19:46:22.1552424Z 2025-05-07T19:46:22.1552427Z 2025-05-07T19:46:22.1552431Z 2025-05-07T19:46:22.1552434Z 2025-05-07T19:46:22.1552457Z 2025-05-07T19:46:22.1552460Z 2025-05-07T19:46:22.1552463Z 2025-05-07T19:46:22.1552467Z 2025-05-07T19:46:22.1552470Z 2025-05-07T19:46:22.1552479Z 2025-05-07T19:46:22.1552620Z  2025-05-07T19:46:22.1552820Z 2025-05-07T19:46:22.1552823Z 2025-05-07T19:46:22.1552827Z 2025-05-07T19:46:22.1552830Z 2025-05-07T19:46:22.1552852Z 2025-05-07T19:46:22.1552932Z 2025-05-07T19:46:22.1552936Z 2025-05-07T19:46:22.1552939Z 2025-05-07T19:46:22.1552943Z 2025-05-07T19:46:22.1552946Z 2025-05-07T19:46:22.1552950Z 2025-05-07T19:46:22.1552953Z 2025-05-07T19:46:22.1552956Z 2025-05-07T19:46:22.1552959Z 2025-05-07T19:46:22.1553111Z  2025-05-07T19:46:22.1553318Z 2025-05-07T19:46:22.1553340Z 2025-05-07T19:46:22.1553344Z 2025-05-07T19:46:22.1553347Z 2025-05-07T19:46:22.1553350Z 2025-05-07T19:46:22.1553354Z 2025-05-07T19:46:22.1553357Z 2025-05-07T19:46:22.1553360Z 2025-05-07T19:46:22.1553364Z 2025-05-07T19:46:22.1553367Z 2025-05-07T19:46:22.1553370Z 2025-05-07T19:46:22.1553373Z 2025-05-07T19:46:22.1553377Z 2025-05-07T19:46:22.1553380Z 2025-05-07T19:46:22.1553387Z 2025-05-07T19:46:22.1553539Z  2025-05-07T19:46:22.1553769Z 2025-05-07T19:46:22.1553773Z 2025-05-07T19:46:22.1553776Z 2025-05-07T19:46:22.1553780Z 2025-05-07T19:46:22.1553783Z 2025-05-07T19:46:22.1553790Z 2025-05-07T19:46:22.1553794Z 2025-05-07T19:46:22.1553797Z 2025-05-07T19:46:22.1553801Z 2025-05-07T19:46:22.1553804Z 2025-05-07T19:46:22.1553808Z 2025-05-07T19:46:22.1553811Z 2025-05-07T19:46:22.1553815Z 2025-05-07T19:46:22.1553818Z 2025-05-07T19:46:22.1553821Z 2025-05-07T19:46:22.1553824Z 2025-05-07T19:46:22.1554000Z  2025-05-07T19:46:22.1554217Z 2025-05-07T19:46:22.1554221Z 2025-05-07T19:46:22.1554224Z 2025-05-07T19:46:22.1554228Z 2025-05-07T19:46:22.1554231Z 2025-05-07T19:46:22.1554235Z 2025-05-07T19:46:22.1554238Z 2025-05-07T19:46:22.1554242Z 2025-05-07T19:46:22.1554245Z 2025-05-07T19:46:22.1554249Z 2025-05-07T19:46:22.1554252Z 2025-05-07T19:46:22.1554255Z 2025-05-07T19:46:22.1554262Z 2025-05-07T19:46:22.1554265Z 2025-05-07T19:46:22.1554269Z 2025-05-07T19:46:22.1554291Z 2025-05-07T19:46:22.1554295Z 2025-05-07T19:46:22.1554458Z  2025-05-07T19:46:22.1554679Z 2025-05-07T19:46:22.1554686Z 2025-05-07T19:46:22.1554690Z 2025-05-07T19:46:22.1554693Z 2025-05-07T19:46:22.1554696Z 2025-05-07T19:46:22.1554700Z 2025-05-07T19:46:22.1554703Z 2025-05-07T19:46:22.1554707Z 2025-05-07T19:46:22.1554710Z 2025-05-07T19:46:22.1554733Z 2025-05-07T19:46:22.1554736Z 2025-05-07T19:46:22.1554739Z 2025-05-07T19:46:22.1554743Z 2025-05-07T19:46:22.1554746Z 2025-05-07T19:46:22.1554750Z 2025-05-07T19:46:22.1554753Z 2025-05-07T19:46:22.1554756Z 2025-05-07T19:46:22.1554760Z 2025-05-07T19:46:22.1554931Z  2025-05-07T19:46:22.1555163Z 2025-05-07T19:46:22.1555184Z 2025-05-07T19:46:22.1555286Z  2025-05-07T19:46:22.1555392Z 2025-05-07T19:46:22.1555396Z 2025-05-07T19:46:22.1555503Z  2025-05-07T19:46:22.1555697Z 2025-05-07T19:46:22.1555701Z 2025-05-07T19:46:22.1555705Z 2025-05-07T19:46:22.1555813Z  2025-05-07T19:46:22.1555929Z 2025-05-07T19:46:22.1555933Z 2025-05-07T19:46:22.1555936Z 2025-05-07T19:46:22.1555944Z 2025-05-07T19:46:22.1556070Z  2025-05-07T19:46:22.1556195Z 2025-05-07T19:46:22.1556199Z 2025-05-07T19:46:22.1556202Z 2025-05-07T19:46:22.1556205Z 2025-05-07T19:46:22.1556209Z 2025-05-07T19:46:22.1556323Z  2025-05-07T19:46:22.1556472Z 2025-05-07T19:46:22.1556475Z 2025-05-07T19:46:22.1556479Z 2025-05-07T19:46:22.1556482Z 2025-05-07T19:46:22.1556486Z 2025-05-07T19:46:22.1556489Z 2025-05-07T19:46:22.1556605Z  2025-05-07T19:46:22.1556761Z 2025-05-07T19:46:22.1556765Z 2025-05-07T19:46:22.1556768Z 2025-05-07T19:46:22.1556772Z 2025-05-07T19:46:22.1556775Z 2025-05-07T19:46:22.1556779Z 2025-05-07T19:46:22.1556782Z 2025-05-07T19:46:22.1556900Z  2025-05-07T19:46:22.1557047Z 2025-05-07T19:46:22.1557056Z 2025-05-07T19:46:22.1557078Z 2025-05-07T19:46:22.1557081Z 2025-05-07T19:46:22.1557085Z 2025-05-07T19:46:22.1557088Z 2025-05-07T19:46:22.1557092Z 2025-05-07T19:46:22.1557095Z 2025-05-07T19:46:22.1557218Z  2025-05-07T19:46:22.1557445Z 2025-05-07T19:46:22.1557449Z 2025-05-07T19:46:22.1557452Z 2025-05-07T19:46:22.1557456Z 2025-05-07T19:46:22.1557459Z 2025-05-07T19:46:22.1557484Z 2025-05-07T19:46:22.1557487Z 2025-05-07T19:46:22.1557490Z 2025-05-07T19:46:22.1557494Z 2025-05-07T19:46:22.1557623Z  2025-05-07T19:46:22.1557791Z 2025-05-07T19:46:22.1557795Z 2025-05-07T19:46:22.1557798Z 2025-05-07T19:46:22.1557801Z 2025-05-07T19:46:22.1557805Z 2025-05-07T19:46:22.1557808Z 2025-05-07T19:46:22.1557812Z 2025-05-07T19:46:22.1557835Z 2025-05-07T19:46:22.1557838Z 2025-05-07T19:46:22.1557842Z 2025-05-07T19:46:22.1557973Z  2025-05-07T19:46:22.1558142Z 2025-05-07T19:46:22.1558146Z 2025-05-07T19:46:22.1558149Z 2025-05-07T19:46:22.1558158Z 2025-05-07T19:46:22.1558161Z 2025-05-07T19:46:22.1558164Z 2025-05-07T19:46:22.1558168Z 2025-05-07T19:46:22.1558172Z 2025-05-07T19:46:22.1558195Z 2025-05-07T19:46:22.1558198Z 2025-05-07T19:46:22.1558201Z 2025-05-07T19:46:22.1558345Z  2025-05-07T19:46:22.1558532Z 2025-05-07T19:46:22.1558536Z 2025-05-07T19:46:22.1558539Z 2025-05-07T19:46:22.1558543Z 2025-05-07T19:46:22.1558546Z 2025-05-07T19:46:22.1558550Z 2025-05-07T19:46:22.1558553Z 2025-05-07T19:46:22.1558558Z 2025-05-07T19:46:22.1558579Z 2025-05-07T19:46:22.1558582Z 2025-05-07T19:46:22.1558586Z 2025-05-07T19:46:22.1558589Z 2025-05-07T19:46:22.1558729Z  2025-05-07T19:46:22.1558917Z 2025-05-07T19:46:22.1558921Z 2025-05-07T19:46:22.1558925Z 2025-05-07T19:46:22.1558929Z 2025-05-07T19:46:22.1558932Z 2025-05-07T19:46:22.1558936Z 2025-05-07T19:46:22.1558939Z 2025-05-07T19:46:22.1558963Z 2025-05-07T19:46:22.1558966Z 2025-05-07T19:46:22.1558973Z 2025-05-07T19:46:22.1558978Z 2025-05-07T19:46:22.1558981Z 2025-05-07T19:46:22.1558985Z 2025-05-07T19:46:22.1559125Z  2025-05-07T19:46:22.1559326Z 2025-05-07T19:46:22.1559330Z 2025-05-07T19:46:22.1559337Z 2025-05-07T19:46:22.1559341Z 2025-05-07T19:46:22.1559363Z 2025-05-07T19:46:22.1559367Z 2025-05-07T19:46:22.1559370Z 2025-05-07T19:46:22.1559373Z 2025-05-07T19:46:22.1559376Z 2025-05-07T19:46:22.1559379Z 2025-05-07T19:46:22.1559383Z 2025-05-07T19:46:22.1559386Z 2025-05-07T19:46:22.1559389Z 2025-05-07T19:46:22.1559393Z 2025-05-07T19:46:22.1559544Z  2025-05-07T19:46:22.1559750Z 2025-05-07T19:46:22.1559773Z 2025-05-07T19:46:22.1559776Z 2025-05-07T19:46:22.1559780Z 2025-05-07T19:46:22.1559783Z 2025-05-07T19:46:22.1559786Z 2025-05-07T19:46:22.1559790Z 2025-05-07T19:46:22.1559793Z 2025-05-07T19:46:22.1559796Z 2025-05-07T19:46:22.1559800Z 2025-05-07T19:46:22.1559803Z 2025-05-07T19:46:22.1559806Z 2025-05-07T19:46:22.1559870Z 2025-05-07T19:46:22.1559873Z 2025-05-07T19:46:22.1559876Z 2025-05-07T19:46:22.1560029Z  2025-05-07T19:46:22.1560262Z 2025-05-07T19:46:22.1560265Z 2025-05-07T19:46:22.1560269Z 2025-05-07T19:46:22.1560276Z 2025-05-07T19:46:22.1560279Z 2025-05-07T19:46:22.1560283Z 2025-05-07T19:46:22.1560287Z 2025-05-07T19:46:22.1560290Z 2025-05-07T19:46:22.1560294Z 2025-05-07T19:46:22.1560297Z 2025-05-07T19:46:22.1560300Z 2025-05-07T19:46:22.1560303Z 2025-05-07T19:46:22.1560307Z 2025-05-07T19:46:22.1560310Z 2025-05-07T19:46:22.1560313Z 2025-05-07T19:46:22.1560317Z 2025-05-07T19:46:22.1560496Z  2025-05-07T19:46:22.1560714Z 2025-05-07T19:46:22.1560718Z 2025-05-07T19:46:22.1560721Z 2025-05-07T19:46:22.1560724Z 2025-05-07T19:46:22.1560728Z 2025-05-07T19:46:22.1560732Z 2025-05-07T19:46:22.1560735Z 2025-05-07T19:46:22.1560738Z 2025-05-07T19:46:22.1560742Z 2025-05-07T19:46:22.1560745Z 2025-05-07T19:46:22.1560753Z 2025-05-07T19:46:22.1560757Z 2025-05-07T19:46:22.1560761Z 2025-05-07T19:46:22.1560764Z 2025-05-07T19:46:22.1560768Z 2025-05-07T19:46:22.1560789Z 2025-05-07T19:46:22.1560792Z 2025-05-07T19:46:22.1561013Z  2025-05-07T19:46:22.1561236Z 2025-05-07T19:46:22.1561240Z 2025-05-07T19:46:22.1561244Z 2025-05-07T19:46:22.1561247Z 2025-05-07T19:46:22.1561250Z 2025-05-07T19:46:22.1561254Z 2025-05-07T19:46:22.1561258Z 2025-05-07T19:46:22.1561262Z 2025-05-07T19:46:22.1561287Z 2025-05-07T19:46:22.1561291Z 2025-05-07T19:46:22.1561294Z 2025-05-07T19:46:22.1561297Z 2025-05-07T19:46:22.1561301Z 2025-05-07T19:46:22.1561304Z 2025-05-07T19:46:22.1561308Z 2025-05-07T19:46:22.1561311Z 2025-05-07T19:46:22.1561314Z 2025-05-07T19:46:22.1561318Z 2025-05-07T19:46:22.1561491Z  2025-05-07T19:46:22.1561719Z 2025-05-07T19:46:22.1561740Z 2025-05-07T19:46:22.1561843Z  2025-05-07T19:46:22.1561955Z 2025-05-07T19:46:22.1561958Z 2025-05-07T19:46:22.1562066Z  2025-05-07T19:46:22.1562200Z 2025-05-07T19:46:22.1562204Z 2025-05-07T19:46:22.1562208Z 2025-05-07T19:46:22.1562315Z  2025-05-07T19:46:22.1562433Z 2025-05-07T19:46:22.1562440Z 2025-05-07T19:46:22.1562444Z 2025-05-07T19:46:22.1562448Z 2025-05-07T19:46:22.1562577Z  2025-05-07T19:46:22.1562701Z 2025-05-07T19:46:22.1562704Z 2025-05-07T19:46:22.1562707Z 2025-05-07T19:46:22.1562711Z 2025-05-07T19:46:22.1562714Z 2025-05-07T19:46:22.1562851Z  2025-05-07T19:46:22.1562984Z 2025-05-07T19:46:22.1562988Z 2025-05-07T19:46:22.1562991Z 2025-05-07T19:46:22.1562995Z 2025-05-07T19:46:22.1562998Z 2025-05-07T19:46:22.1563002Z 2025-05-07T19:46:22.1563113Z  2025-05-07T19:46:22.1563269Z 2025-05-07T19:46:22.1563273Z 2025-05-07T19:46:22.1563276Z 2025-05-07T19:46:22.1563280Z 2025-05-07T19:46:22.1563283Z 2025-05-07T19:46:22.1563288Z 2025-05-07T19:46:22.1563291Z 2025-05-07T19:46:22.1563408Z  2025-05-07T19:46:22.1563556Z 2025-05-07T19:46:22.1563559Z 2025-05-07T19:46:22.1563581Z 2025-05-07T19:46:22.1563584Z 2025-05-07T19:46:22.1563588Z 2025-05-07T19:46:22.1563591Z 2025-05-07T19:46:22.1563595Z 2025-05-07T19:46:22.1563601Z 2025-05-07T19:46:22.1563741Z  done 2025-05-07T19:46:22.3664948Z Preparing transaction: / - done 2025-05-07T19:46:23.0683725Z Verifying transaction: | / - \ | / - done 2025-05-07T19:46:23.3735086Z Executing transaction: | / - done 2025-05-07T19:46:25.3376646Z [INSTALL] Fixing file placements for CUDA 12.8.0+ ... 2025-05-07T19:46:25.3377116Z [INSTALL] Creating symlinks: libnvToolsExt.so 2025-05-07T19:46:25.3377867Z + ln -sf /github/home/miniconda/envs/build_binary/lib/libnvToolsExt.so.1 /github/home/miniconda/envs/build_binary/lib/libnvToolsExt.so 2025-05-07T19:46:25.3378495Z 2025-05-07T19:46:25.3398997Z 2025-05-07T19:46:25.3399840Z + ln -sf /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvToolsExt.so.1 /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvToolsExt.so 2025-05-07T19:46:25.3400974Z 2025-05-07T19:46:25.3416704Z 2025-05-07T19:46:25.3416959Z [INSTALL] Copying nvtx3 headers ... 2025-05-07T19:46:25.3421378Z + cp -r /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExt.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtCuda.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtCudaRt.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtOpenCL.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtSync.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvtx3.hpp /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvtxDetail /github/home/miniconda/envs/build_binary/include/ 2025-05-07T19:46:25.3425714Z 2025-05-07T19:46:25.3656491Z 2025-05-07T19:46:25.3665724Z + cp -r /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExt.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtCuda.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtCudaRt.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtOpenCL.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtSync.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvtx3.hpp /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvtxDetail /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include/ 2025-05-07T19:46:25.3669751Z 2025-05-07T19:46:25.3691777Z 2025-05-07T19:46:25.3692582Z [INSTALL] Appending libcuda.so path to LD_LIBRARY_PATH ... 2025-05-07T19:46:25.4081300Z [ENV] Appending to LD_LIBRARY_PATH: /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs ... 2025-05-07T19:46:27.2704273Z + conda env config vars set -n build_binary LD_LIBRARY_PATH=/github/home/miniconda/envs/build_binary/lib:/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs 2025-05-07T19:46:27.2705065Z 2025-05-07T19:46:27.6859782Z 2025-05-07T19:46:27.6875232Z [INSTALL] Setting environment variable NVML_LIB_PATH ... 2025-05-07T19:46:27.7266002Z + conda env config vars set -n build_binary NVML_LIB_PATH=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:46:27.7266571Z 2025-05-07T19:46:28.1330823Z 2025-05-07T19:46:28.1331688Z [INSTALL] Setting environment variable CUDA_INCLUDE_DIRS ... 2025-05-07T19:46:28.1333819Z + conda env config vars set -n build_binary CUDA_INCLUDE_DIRS="/github/home/miniconda/envs/build_binary/include/:/github/home/miniconda/envs/build_binary/targets/x86_64-linux/include/" 2025-05-07T19:46:28.1334608Z 2025-05-07T19:46:28.5413639Z 2025-05-07T19:46:30.5312064Z [CHECK] cuda_runtime.h found in CONDA_PREFIX PATH (file): /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include/cuda_runtime.h 2025-05-07T19:46:32.4763227Z [CHECK] libcuda.so found in CONDA_PREFIX PATH (file): /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:46:34.4168710Z [CHECK] libnvToolsExt.so found in CONDA_PREFIX PATH (symbolic link): /github/home/miniconda/envs/build_binary/lib/libnvToolsExt.so 2025-05-07T19:46:34.4169582Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvToolsExt.so 2025-05-07T19:46:36.3801459Z [CHECK] libnvidia-ml.so found in CONDA_PREFIX PATH (file): /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libnvidia-ml.so 2025-05-07T19:46:38.2197209Z /github/home/miniconda/envs/build_binary/bin/nvcc 2025-05-07T19:46:38.2198068Z 2025-05-07T19:46:38.2768588Z [CHECK] Binary nvcc found in PATH 2025-05-07T19:46:42.0465508Z clang version 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4) 2025-05-07T19:46:42.0466232Z Target: x86_64-conda-linux-gnu 2025-05-07T19:46:42.0466540Z Thread model: posix 2025-05-07T19:46:42.0466916Z InstalledDir: /github/home/miniconda/envs/build_binary/bin 2025-05-07T19:46:42.0467572Z Configuration file: /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-clang.cfg 2025-05-07T19:46:42.0468078Z 2025-05-07T19:46:42.1218013Z [INSTALL] Resetting compiler symlinks to clang ... 2025-05-07T19:46:45.9470621Z + ln -sf /github/home/miniconda/envs/build_binary/bin/clang /github/home/miniconda/envs/build_binary/bin/cc 2025-05-07T19:46:45.9472153Z 2025-05-07T19:46:45.9485333Z 2025-05-07T19:46:45.9501527Z + ln -sf /github/home/miniconda/envs/build_binary/bin/clang /github/home/miniconda/envs/build_binary/bin/gcc 2025-05-07T19:46:45.9503005Z 2025-05-07T19:46:45.9516993Z 2025-05-07T19:46:45.9540828Z + ln -sf /github/home/miniconda/envs/build_binary/bin/clang++ /github/home/miniconda/envs/build_binary/bin/c++ 2025-05-07T19:46:45.9541380Z 2025-05-07T19:46:45.9559030Z 2025-05-07T19:46:45.9578085Z + ln -sf /github/home/miniconda/envs/build_binary/bin/clang++ /github/home/miniconda/envs/build_binary/bin/g++ 2025-05-07T19:46:45.9578643Z 2025-05-07T19:46:45.9595226Z 2025-05-07T19:46:45.9595889Z + ls -la /github/home/miniconda/envs/build_binary/etc/conda/activate.d 2025-05-07T19:46:45.9596271Z 2025-05-07T19:46:45.9614311Z total 56 2025-05-07T19:46:45.9614624Z drwxr-xr-x. 2 root root 16384 May 7 19:46 . 2025-05-07T19:46:45.9614985Z drwxr-xr-x. 5 root root 62 May 7 19:44 .. 2025-05-07T19:46:45.9615582Z -rw-r--r--. 2 root root 3778 Jun 10 2024 activate-binutils_linux-64.sh 2025-05-07T19:46:45.9616100Z -rw-r--r--. 2 root root 11630 Jun 10 2024 activate-gcc_linux-64.sh 2025-05-07T19:46:45.9616587Z -rw-r--r--. 2 root root 5190 Jun 10 2024 activate-gxx_linux-64.sh 2025-05-07T19:46:45.9617044Z -rw-r--r--. 2 root root 136 Mar 27 01:27 libglib_activate.sh 2025-05-07T19:46:45.9617492Z -rw-r--r--. 2 root root 873 Jun 5 2024 libxml2_activate.sh 2025-05-07T19:46:45.9617936Z -rw-r--r--. 2 root root 499 Nov 30 04:26 openjdk_activate.sh 2025-05-07T19:46:45.9618363Z -rw-r--r--. 2 root root 2932 Jan 24 22:22 ~cuda-nvcc_activate.sh 2025-05-07T19:46:45.9618639Z 2025-05-07T19:46:45.9618886Z [INSTALL] Removing the -ccbin=CXX hook from NVCC activation scripts ... 2025-05-07T19:46:45.9619557Z + sed -i /-ccbin=/d /github/home/miniconda/envs/build_binary/etc/conda/activate.d/*cuda-nvcc_activate.sh 2025-05-07T19:46:45.9620020Z 2025-05-07T19:46:45.9642597Z 2025-05-07T19:46:45.9643181Z + conda run -n build_binary c++ --version | grep -i clang 2025-05-07T19:46:45.9643482Z 2025-05-07T19:46:47.8582884Z clang version 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4) 2025-05-07T19:46:47.8585497Z Configuration file: /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-clang++.cfg 2025-05-07T19:46:47.8587472Z 2025-05-07T19:46:47.8587625Z [BUILD] Setting Clang as the NVCC host compiler: 2025-05-07T19:46:49.7148117Z [BUILD] Setting prepend flags for NVCC ... 2025-05-07T19:46:49.7149254Z + conda env config vars set -n build_binary NVCC_PREPEND_FLAGS="-allow-unsupported-compiler -Xcompiler -stdlib=libstdc++ -ccbin /github/home/miniconda/envs/build_binary/bin/c++" 2025-05-07T19:46:49.7150037Z 2025-05-07T19:46:50.1419365Z 2025-05-07T19:46:50.1420493Z + conda run -n build_binary printenv NVCC_PREPEND_FLAGS 2025-05-07T19:46:50.1421347Z 2025-05-07T19:46:51.9889625Z -allow-unsupported-compiler -Xcompiler -stdlib=libstdc++ -ccbin /github/home/miniconda/envs/build_binary/bin/c++ 2025-05-07T19:46:51.9891263Z 2025-05-07T19:46:52.0641037Z 2025-05-07T19:46:52.0642299Z [INFO] Printing out all preprocessor defines in nvcc ... 2025-05-07T19:46:52.0642842Z + conda run -n build_binary nvcc --compiler-options -dM -E -x cu - < /dev/null 2025-05-07T19:46:52.0643162Z 2025-05-07T19:46:53.9595618Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:46:53.9596568Z #define ADJ_ESTERROR 0x0008 2025-05-07T19:46:53.9597067Z 2025-05-07T19:46:53.9597174Z #define ADJ_FREQUENCY 0x0002 2025-05-07T19:46:53.9597447Z #define ADJ_MAXERROR 0x0004 2025-05-07T19:46:53.9597765Z #define ADJ_MICRO 0x1000 2025-05-07T19:46:53.9598025Z #define ADJ_NANO 0x2000 2025-05-07T19:46:53.9598317Z #define ADJ_OFFSET 0x0001 2025-05-07T19:46:53.9598623Z #define ADJ_OFFSET_SINGLESHOT 0x8001 2025-05-07T19:46:53.9598937Z #define ADJ_OFFSET_SS_READ 0xa001 2025-05-07T19:46:53.9599269Z #define ADJ_STATUS 0x0010 2025-05-07T19:46:53.9599550Z #define ADJ_TAI 0x0080 2025-05-07T19:46:53.9599842Z #define ADJ_TICK 0x4000 2025-05-07T19:46:53.9600087Z #define ADJ_TIMECONST 0x0020 2025-05-07T19:46:53.9600376Z #define AIO_PRIO_DELTA_MAX 20 2025-05-07T19:46:53.9600886Z #define BC_BASE_MAX _POSIX2_BC_BASE_MAX 2025-05-07T19:46:53.9601219Z #define BC_DIM_MAX _POSIX2_BC_DIM_MAX 2025-05-07T19:46:53.9601525Z #define BC_SCALE_MAX _POSIX2_BC_SCALE_MAX 2025-05-07T19:46:53.9601862Z #define BC_STRING_MAX _POSIX2_BC_STRING_MAX 2025-05-07T19:46:53.9602186Z #define BIG_ENDIAN __BIG_ENDIAN 2025-05-07T19:46:53.9602453Z #define BUFSIZ _IO_BUFSIZ 2025-05-07T19:46:53.9602719Z #define BYTE_ORDER __BYTE_ORDER 2025-05-07T19:46:53.9602989Z #define CHARCLASS_NAME_MAX 2048 2025-05-07T19:46:53.9603269Z #define CHAR_BIT __CHAR_BIT__ 2025-05-07T19:46:53.9603541Z #define CHAR_MAX __SCHAR_MAX__ 2025-05-07T19:46:53.9603825Z #define CHAR_MIN SCHAR_MIN 2025-05-07T19:46:53.9604084Z #define CLOCKS_PER_SEC 1000000l 2025-05-07T19:46:53.9604363Z #define CLOCK_BOOTTIME 7 2025-05-07T19:46:53.9604614Z #define CLOCK_BOOTTIME_ALARM 9 2025-05-07T19:46:53.9604901Z #define CLOCK_MONOTONIC 1 2025-05-07T19:46:53.9605177Z #define CLOCK_MONOTONIC_COARSE 6 2025-05-07T19:46:53.9605453Z #define CLOCK_MONOTONIC_RAW 4 2025-05-07T19:46:53.9605752Z #define CLOCK_PROCESS_CPUTIME_ID 2 2025-05-07T19:46:53.9606034Z #define CLOCK_REALTIME 0 2025-05-07T19:46:53.9606404Z #define CLOCK_REALTIME_ALARM 8 2025-05-07T19:46:53.9606671Z #define CLOCK_REALTIME_COARSE 5 2025-05-07T19:46:53.9606942Z #define CLOCK_TAI 11 2025-05-07T19:46:53.9607280Z #define CLOCK_THREAD_CPUTIME_ID 3 2025-05-07T19:46:53.9607734Z #define COLL_WEIGHTS_MAX 255 2025-05-07T19:46:53.9607987Z #define CUDARTAPI 2025-05-07T19:46:53.9608225Z #define CUDARTAPI_CDECL 2025-05-07T19:46:53.9608480Z #define CUDART_CB 2025-05-07T19:46:53.9608880Z #define CUDART_DEVICE __device__ 2025-05-07T19:46:53.9609174Z #define CUDART_VERSION 12080 2025-05-07T19:46:53.9609449Z #define CUDA_DOUBLE_MATH_FUNCTIONS 1 2025-05-07T19:46:53.9609761Z #define CUDA_IPC_HANDLE_SIZE 64 2025-05-07T19:46:53.9610043Z #define CU_UUID_HAS_BEEN_DEFINED 2025-05-07T19:46:53.9610345Z #define DELAYTIMER_MAX 2147483647 2025-05-07T19:46:53.9610615Z #define DOMAIN 1 2025-05-07T19:46:53.9610851Z #define EOF (-1) 2025-05-07T19:46:53.9611080Z #define EXIT_FAILURE 1 2025-05-07T19:46:53.9611339Z #define EXIT_SUCCESS 0 2025-05-07T19:46:53.9611605Z #define EXPR_NEST_MAX _POSIX2_EXPR_NEST_MAX 2025-05-07T19:46:53.9611969Z #define FD_CLR(fd,fdsetp) __FD_CLR (fd, fdsetp) 2025-05-07T19:46:53.9612349Z #define FD_ISSET(fd,fdsetp) __FD_ISSET (fd, fdsetp) 2025-05-07T19:46:53.9612709Z #define FD_SET(fd,fdsetp) __FD_SET (fd, fdsetp) 2025-05-07T19:46:53.9613054Z #define FD_SETSIZE __FD_SETSIZE 2025-05-07T19:46:53.9613342Z #define FD_ZERO(fdsetp) __FD_ZERO (fdsetp) 2025-05-07T19:46:53.9613671Z #define FILENAME_MAX 4096 2025-05-07T19:46:53.9613939Z #define FOPEN_MAX 16 2025-05-07T19:46:53.9614240Z #define FP_ILOGB0 (-2147483647 - 1) 2025-05-07T19:46:53.9614553Z #define FP_ILOGBNAN (-2147483647 - 1) 2025-05-07T19:46:53.9614876Z #define FP_INFINITE 1 2025-05-07T19:46:53.9615410Z #define FP_NAN 0 2025-05-07T19:46:53.9615646Z #define FP_NORMAL 4 2025-05-07T19:46:53.9615918Z #define FP_SUBNORMAL 3 2025-05-07T19:46:53.9616171Z #define FP_ZERO 2 2025-05-07T19:46:53.9616482Z #define HOST_NAME_MAX 64 2025-05-07T19:46:53.9616748Z #define HUGE 3.40282347e+38F 2025-05-07T19:46:53.9617055Z #define HUGE_VAL (__builtin_huge_val()) 2025-05-07T19:46:53.9617387Z #define HUGE_VALF (__builtin_huge_valf()) 2025-05-07T19:46:53.9617749Z #define HUGE_VALL (__builtin_huge_vall()) 2025-05-07T19:46:53.9618082Z #define INFINITY (__builtin_inff()) 2025-05-07T19:46:53.9618414Z #define INT_MAX __INT_MAX__ 2025-05-07T19:46:53.9618724Z #define INT_MIN (-__INT_MAX__ -1) 2025-05-07T19:46:53.9618995Z #define IOV_MAX 1024 2025-05-07T19:46:53.9619425Z #define LINE_MAX _POSIX2_LINE_MAX 2025-05-07T19:46:53.9619712Z #define LITTLE_ENDIAN __LITTLE_ENDIAN 2025-05-07T19:46:53.9620019Z #define LLONG_MAX __LONG_LONG_MAX__ 2025-05-07T19:46:53.9620319Z #define LLONG_MIN (-__LONG_LONG_MAX__-1LL) 2025-05-07T19:46:53.9620638Z #define LOGIN_NAME_MAX 256 2025-05-07T19:46:53.9620885Z #define LONG_BIT 64 2025-05-07T19:46:53.9621132Z #define LONG_LONG_MAX __LONG_LONG_MAX__ 2025-05-07T19:46:53.9621528Z #define LONG_LONG_MIN (-__LONG_LONG_MAX__-1LL) 2025-05-07T19:46:53.9621958Z #define LONG_MAX __LONG_MAX__ 2025-05-07T19:46:53.9622228Z #define LONG_MIN (-__LONG_MAX__ -1L) 2025-05-07T19:46:53.9622492Z #define L_ctermid 9 2025-05-07T19:46:53.9622716Z #define L_cuserid 9 2025-05-07T19:46:53.9622923Z #define L_tmpnam 20 2025-05-07T19:46:53.9623153Z #define MATH_ERREXCEPT 2 2025-05-07T19:46:53.9623380Z #define MATH_ERRNO 1 2025-05-07T19:46:53.9623605Z #define MAX_CANON 255 2025-05-07T19:46:53.9623817Z #define MAX_INPUT 255 2025-05-07T19:46:53.9624077Z #define MB_CUR_MAX (__ctype_get_mb_cur_max ()) 2025-05-07T19:46:53.9624363Z #define MB_LEN_MAX 16 2025-05-07T19:46:53.9624607Z #define MOD_CLKA ADJ_OFFSET_SINGLESHOT 2025-05-07T19:46:53.9624877Z #define MOD_CLKB ADJ_TICK 2025-05-07T19:46:53.9625127Z #define MOD_ESTERROR ADJ_ESTERROR 2025-05-07T19:46:53.9625412Z #define MOD_FREQUENCY ADJ_FREQUENCY 2025-05-07T19:46:53.9625682Z #define MOD_MAXERROR ADJ_MAXERROR 2025-05-07T19:46:53.9625953Z #define MOD_MICRO ADJ_MICRO 2025-05-07T19:46:53.9626196Z #define MOD_NANO ADJ_NANO 2025-05-07T19:46:53.9626443Z #define MOD_OFFSET ADJ_OFFSET 2025-05-07T19:46:53.9626690Z #define MOD_STATUS ADJ_STATUS 2025-05-07T19:46:53.9626942Z #define MOD_TAI ADJ_TAI 2025-05-07T19:46:53.9627173Z #define MOD_TIMECONST ADJ_TIMECONST 2025-05-07T19:46:53.9627450Z #define MQ_PRIO_MAX 32768 2025-05-07T19:46:53.9627684Z #define M_1_PI 0.31830988618379067154 2025-05-07T19:46:53.9627992Z #define M_1_PIl 0.318309886183790671537767526745028724L 2025-05-07T19:46:53.9628312Z #define M_2_PI 0.63661977236758134308 2025-05-07T19:46:53.9628602Z #define M_2_PIl 0.636619772367581343075535053490057448L 2025-05-07T19:46:53.9628925Z #define M_2_SQRTPI 1.12837916709551257390 2025-05-07T19:46:53.9629246Z #define M_2_SQRTPIl 1.128379167095512573896158903121545172L 2025-05-07T19:46:53.9629585Z #define M_E 2.7182818284590452354 2025-05-07T19:46:53.9629862Z #define M_El 2.718281828459045235360287471352662498L 2025-05-07T19:46:53.9630172Z #define M_LN10 2.30258509299404568402 2025-05-07T19:46:53.9630471Z #define M_LN10l 2.302585092994045684017991454684364208L 2025-05-07T19:46:53.9630786Z #define M_LN2 0.69314718055994530942 2025-05-07T19:46:53.9631084Z #define M_LN2l 0.693147180559945309417232121458176568L 2025-05-07T19:46:53.9631384Z #define M_LOG10E 0.43429448190325182765 2025-05-07T19:46:53.9631706Z #define M_LOG10El 0.434294481903251827651128918916605082L 2025-05-07T19:46:53.9632020Z #define M_LOG2E 1.4426950408889634074 2025-05-07T19:46:53.9632331Z #define M_LOG2El 1.442695040888963407359924681001892137L 2025-05-07T19:46:53.9632639Z #define M_PI 3.14159265358979323846 2025-05-07T19:46:53.9632912Z #define M_PI_2 1.57079632679489661923 2025-05-07T19:46:53.9633211Z #define M_PI_2l 1.570796326794896619231321691639751442L 2025-05-07T19:46:53.9633528Z #define M_PI_4 0.78539816339744830962 2025-05-07T19:46:53.9633921Z #define M_PI_4l 0.785398163397448309615660845819875721L 2025-05-07T19:46:53.9634255Z #define M_PIl 3.141592653589793238462643383279502884L 2025-05-07T19:46:53.9634578Z #define M_SQRT1_2 0.70710678118654752440 2025-05-07T19:46:53.9634891Z #define M_SQRT1_2l 0.707106781186547524400844362104849039L 2025-05-07T19:46:53.9635225Z #define M_SQRT2 1.41421356237309504880 2025-05-07T19:46:53.9635530Z #define M_SQRT2l 1.414213562373095048801688724209698079L 2025-05-07T19:46:53.9635849Z #define NAME_MAX 255 2025-05-07T19:46:53.9636073Z #define NAN (__builtin_nanf ("")) 2025-05-07T19:46:53.9636362Z #define NFDBITS __NFDBITS 2025-05-07T19:46:53.9636610Z #define NGROUPS_MAX 65536 2025-05-07T19:46:53.9636848Z #define NL_ARGMAX _POSIX_ARG_MAX 2025-05-07T19:46:53.9637126Z #define NL_LANGMAX _POSIX2_LINE_MAX 2025-05-07T19:46:53.9637390Z #define NL_MSGMAX INT_MAX 2025-05-07T19:46:53.9637633Z #define NL_NMAX INT_MAX 2025-05-07T19:46:53.9637856Z #define NL_SETMAX INT_MAX 2025-05-07T19:46:53.9638113Z #define NL_TEXTMAX INT_MAX 2025-05-07T19:46:53.9638345Z #define NULL __null 2025-05-07T19:46:53.9638565Z #define NZERO 20 2025-05-07T19:46:53.9638767Z #define OVERFLOW 3 2025-05-07T19:46:53.9638999Z #define PATH_MAX 4096 2025-05-07T19:46:53.9639313Z #define PDP_ENDIAN __PDP_ENDIAN 2025-05-07T19:46:53.9639563Z #define PIPE_BUF 4096 2025-05-07T19:46:53.9639795Z #define PLOSS 6 2025-05-07T19:46:53.9640130Z #define PTHREAD_DESTRUCTOR_ITERATIONS _POSIX_THREAD_DESTRUCTOR_ITERATIONS 2025-05-07T19:46:53.9640738Z #define PTHREAD_KEYS_MAX 1024 2025-05-07T19:46:53.9641009Z #define PTHREAD_STACK_MIN 16384 2025-05-07T19:46:53.9641296Z #define P_tmpdir "/tmp" 2025-05-07T19:46:53.9641535Z #define RAND_MAX 2147483647 2025-05-07T19:46:53.9641979Z #define RE_DUP_MAX (0x7fff) 2025-05-07T19:46:53.9642233Z #define RTSIG_MAX 32 2025-05-07T19:46:53.9642490Z #define SCHAR_MAX __SCHAR_MAX__ 2025-05-07T19:46:53.9642786Z #define SCHAR_MIN (-__SCHAR_MAX__-1) 2025-05-07T19:46:53.9643070Z #define SEEK_CUR 1 2025-05-07T19:46:53.9643309Z #define SEEK_DATA 3 2025-05-07T19:46:53.9643523Z #define SEEK_END 2 2025-05-07T19:46:53.9643752Z #define SEEK_HOLE 4 2025-05-07T19:46:53.9643966Z #define SEEK_SET 0 2025-05-07T19:46:53.9644207Z #define SEM_VALUE_MAX (2147483647) 2025-05-07T19:46:53.9644492Z #define SHRT_MAX __SHRT_MAX__ 2025-05-07T19:46:53.9644774Z #define SHRT_MIN (-__SHRT_MAX__ -1) 2025-05-07T19:46:53.9645044Z #define SING 2 2025-05-07T19:46:53.9645273Z #define SSIZE_MAX LONG_MAX 2025-05-07T19:46:53.9645520Z #define STA_CLK 0x8000 2025-05-07T19:46:53.9645773Z #define STA_CLOCKERR 0x1000 2025-05-07T19:46:53.9646140Z #define STA_DEL 0x0020 2025-05-07T19:46:53.9646363Z #define STA_FLL 0x0008 2025-05-07T19:46:53.9646607Z #define STA_FREQHOLD 0x0080 2025-05-07T19:46:53.9646853Z #define STA_INS 0x0010 2025-05-07T19:46:53.9647093Z #define STA_MODE 0x4000 2025-05-07T19:46:53.9647426Z #define STA_NANO 0x2000 2025-05-07T19:46:53.9647657Z #define STA_PLL 0x0001 2025-05-07T19:46:53.9647881Z #define STA_PPSERROR 0x0800 2025-05-07T19:46:53.9648138Z #define STA_PPSFREQ 0x0002 2025-05-07T19:46:53.9648376Z #define STA_PPSJITTER 0x0200 2025-05-07T19:46:53.9648636Z #define STA_PPSSIGNAL 0x0100 2025-05-07T19:46:53.9648881Z #define STA_PPSTIME 0x0004 2025-05-07T19:46:53.9649135Z #define STA_PPSWANDER 0x0400 2025-05-07T19:46:53.9649675Z #define STA_RONLY (STA_PPSSIGNAL | STA_PPSJITTER | STA_PPSWANDER | STA_PPSERROR | STA_CLOCKERR | STA_NANO | STA_MODE | STA_CLK) 2025-05-07T19:46:53.9650227Z #define STA_UNSYNC 0x0040 2025-05-07T19:46:53.9650474Z #define TIMER_ABSTIME 1 2025-05-07T19:46:53.9650692Z #define TIME_UTC 1 2025-05-07T19:46:53.9650906Z #define TLOSS 5 2025-05-07T19:46:53.9651104Z #define TMP_MAX 238328 2025-05-07T19:46:53.9651333Z #define TTY_NAME_MAX 32 2025-05-07T19:46:53.9651565Z #define UCHAR_MAX (__SCHAR_MAX__*2 +1) 2025-05-07T19:46:53.9651859Z #define UINT_MAX (__INT_MAX__ *2U +1U) 2025-05-07T19:46:53.9652171Z #define ULLONG_MAX (__LONG_LONG_MAX__*2ULL+1ULL) 2025-05-07T19:46:53.9652511Z #define ULONG_LONG_MAX (__LONG_LONG_MAX__*2ULL+1ULL) 2025-05-07T19:46:53.9652929Z #define ULONG_MAX (__LONG_MAX__ *2UL+1UL) 2025-05-07T19:46:53.9653205Z #define UNDERFLOW 4 2025-05-07T19:46:53.9653442Z #define USHRT_MAX (__SHRT_MAX__ *2 +1) 2025-05-07T19:46:53.9653706Z #define WCONTINUED 8 2025-05-07T19:46:53.9653937Z #define WEXITED 4 2025-05-07T19:46:53.9654232Z #define WEXITSTATUS(status) __WEXITSTATUS (__WAIT_INT (status)) 2025-05-07T19:46:53.9654694Z #define WIFCONTINUED(status) __WIFCONTINUED (__WAIT_INT (status)) 2025-05-07T19:46:53.9655127Z #define WIFEXITED(status) __WIFEXITED (__WAIT_INT (status)) 2025-05-07T19:46:53.9655809Z #define WIFSIGNALED(status) __WIFSIGNALED (__WAIT_INT (status)) 2025-05-07T19:46:53.9656291Z #define WIFSTOPPED(status) __WIFSTOPPED (__WAIT_INT (status)) 2025-05-07T19:46:53.9656656Z #define WNOHANG 1 2025-05-07T19:46:53.9656897Z #define WNOWAIT 0x01000000 2025-05-07T19:46:53.9657142Z #define WORD_BIT 32 2025-05-07T19:46:53.9657373Z #define WSTOPPED 2 2025-05-07T19:46:53.9657661Z #define WSTOPSIG(status) __WSTOPSIG (__WAIT_INT (status)) 2025-05-07T19:46:53.9658099Z #define WTERMSIG(status) __WTERMSIG (__WAIT_INT (status)) 2025-05-07T19:46:53.9658444Z #define WUNTRACED 2 2025-05-07T19:46:53.9658687Z #define XATTR_LIST_MAX 65536 2025-05-07T19:46:53.9659063Z #define XATTR_NAME_MAX 255 2025-05-07T19:46:53.9659317Z #define XATTR_SIZE_MAX 65536 2025-05-07T19:46:53.9659604Z #define X_TLOSS 1.41484755040568800000e+16 2025-05-07T19:46:53.9659890Z #define _ACRTIMP 2025-05-07T19:46:53.9660119Z #define _ALLOCA_H 1 2025-05-07T19:46:53.9660336Z #define _ASSERT_H 1 2025-05-07T19:46:53.9660573Z #define _ATFILE_SOURCE 1 2025-05-07T19:46:53.9660820Z #define _BITS_BYTESWAP_H 1 2025-05-07T19:46:53.9661089Z #define _BITS_POSIX1_LIM_H 1 2025-05-07T19:46:53.9661352Z #define _BITS_POSIX2_LIM_H 1 2025-05-07T19:46:53.9661634Z #define _BITS_PTHREADTYPES_H 1 2025-05-07T19:46:53.9661917Z #define _BITS_TIMEX_H 1 2025-05-07T19:46:53.9662158Z #define _BITS_TIME_H 1 2025-05-07T19:46:53.9662414Z #define _BITS_TYPESIZES_H 1 2025-05-07T19:46:53.9662664Z #define _BITS_TYPES_H 1 2025-05-07T19:46:53.9662917Z #define _BSD_SOURCE 1 2025-05-07T19:46:53.9663154Z #define _CONCEPT_CHECK_H 1 2025-05-07T19:46:53.9663423Z #define _CPP_TYPE_TRAITS_H 1 2025-05-07T19:46:53.9663674Z #define _CRTIMP 2025-05-07T19:46:53.9663903Z #define _CTYPE_H 1 2025-05-07T19:46:53.9664117Z #define _ENDIAN_H 1 2025-05-07T19:46:53.9664359Z #define _EXCEPTION_DEFINES_H 1 2025-05-07T19:46:53.9664631Z #define _EXT_NUMERIC_TRAITS 1 2025-05-07T19:46:53.9664909Z #define _EXT_TYPE_TRAITS 1 2025-05-07T19:46:53.9665171Z #define _FEATURES_H 1 2025-05-07T19:46:53.9665408Z #define _FUNCTEXCEPT_H 1 2025-05-07T19:46:53.9665672Z #define _GCC_LIMITS_H_ 2025-05-07T19:46:53.9665953Z #define _GLIBCXX11_DEPRECATED _GLIBCXX_DEPRECATED 2025-05-07T19:46:53.9666441Z #define _GLIBCXX11_DEPRECATED_SUGGEST(ALT) _GLIBCXX_DEPRECATED_SUGGEST(ALT) 2025-05-07T19:46:53.9666887Z #define _GLIBCXX11_USE_C99_COMPLEX 1 2025-05-07T19:46:53.9667198Z #define _GLIBCXX11_USE_C99_MATH 1 2025-05-07T19:46:53.9667501Z #define _GLIBCXX11_USE_C99_STDIO 1 2025-05-07T19:46:53.9667922Z #define _GLIBCXX11_USE_C99_STDLIB 1 2025-05-07T19:46:53.9668365Z #define _GLIBCXX11_USE_C99_WCHAR 1 2025-05-07T19:46:53.9668670Z #define _GLIBCXX14_CONSTEXPR constexpr 2025-05-07T19:46:53.9668998Z #define _GLIBCXX17_CONSTEXPR constexpr 2025-05-07T19:46:53.9669325Z #define _GLIBCXX17_DEPRECATED [[__deprecated__]] 2025-05-07T19:46:53.9669977Z #define _GLIBCXX17_DEPRECATED_SUGGEST(ALT) _GLIBCXX_DEPRECATED_SUGGEST(ALT) 2025-05-07T19:46:53.9670422Z #define _GLIBCXX17_INLINE inline 2025-05-07T19:46:53.9670716Z #define _GLIBCXX20_CONSTEXPR 2025-05-07T19:46:53.9670992Z #define _GLIBCXX20_DEPRECATED(MSG) 2025-05-07T19:46:53.9671317Z #define _GLIBCXX20_DEPRECATED_SUGGEST(ALT) 2025-05-07T19:46:53.9671636Z #define _GLIBCXX98_USE_C99_COMPLEX 1 2025-05-07T19:46:53.9671947Z #define _GLIBCXX98_USE_C99_MATH 1 2025-05-07T19:46:53.9672247Z #define _GLIBCXX98_USE_C99_STDIO 1 2025-05-07T19:46:53.9672531Z #define _GLIBCXX98_USE_C99_STDLIB 1 2025-05-07T19:46:53.9672835Z #define _GLIBCXX98_USE_C99_WCHAR 1 2025-05-07T19:46:53.9676358Z #define _GLIBCXX_ABI_TAG_CXX11 __attribute ((__abi_tag__ ("cxx11"))) 2025-05-07T19:46:53.9676776Z #define _GLIBCXX_ATOMIC_BUILTINS 1 2025-05-07T19:46:53.9677087Z #define _GLIBCXX_BEGIN_EXTERN_C extern "C" { 2025-05-07T19:46:53.9677437Z #define _GLIBCXX_BEGIN_NAMESPACE_ALGO 2025-05-07T19:46:53.9677755Z #define _GLIBCXX_BEGIN_NAMESPACE_CONTAINER 2025-05-07T19:46:53.9678144Z #define _GLIBCXX_BEGIN_NAMESPACE_CXX11 namespace __cxx11 { 2025-05-07T19:46:53.9678524Z #define _GLIBCXX_BEGIN_NAMESPACE_LDBL 2025-05-07T19:46:53.9678953Z #define _GLIBCXX_BEGIN_NAMESPACE_LDBL_OR_CXX11 _GLIBCXX_BEGIN_NAMESPACE_CXX11 2025-05-07T19:46:53.9679421Z #define _GLIBCXX_BEGIN_NAMESPACE_VERSION 2025-05-07T19:46:53.9679732Z #define _GLIBCXX_BITS_SPECFUN_H 1 2025-05-07T19:46:53.9680040Z #define _GLIBCXX_BITS_STD_ABS_H 2025-05-07T19:46:53.9680312Z #define _GLIBCXX_CMATH 1 2025-05-07T19:46:53.9680609Z #define _GLIBCXX_CONST __attribute__ ((__const__)) 2025-05-07T19:46:53.9680953Z #define _GLIBCXX_CONSTEXPR constexpr 2025-05-07T19:46:53.9681276Z #define _GLIBCXX_CPU_DEFINES 1 2025-05-07T19:46:53.9681556Z #define _GLIBCXX_CSTDLIB 1 2025-05-07T19:46:53.9681813Z #define _GLIBCXX_CXX_CONFIG_H 1 2025-05-07T19:46:53.9682253Z #define _GLIBCXX_DARWIN_USE_64_BIT_INODE 1 2025-05-07T19:46:53.9682578Z #define _GLIBCXX_DEBUG_ASSERT(_Condition) 2025-05-07T19:46:53.9682911Z #define _GLIBCXX_DEBUG_ASSERTIONS_H 1 2025-05-07T19:46:53.9683215Z #define _GLIBCXX_DEBUG_MACRO_SWITCH_H 1 2025-05-07T19:46:53.9683540Z #define _GLIBCXX_DEBUG_ONLY(_Statement) 2025-05-07T19:46:53.9683872Z #define _GLIBCXX_DEBUG_PEDASSERT(_Condition) 2025-05-07T19:46:53.9684263Z #define _GLIBCXX_DEFAULT_ABI_TAG _GLIBCXX_ABI_TAG_CXX11 2025-05-07T19:46:53.9684688Z #define _GLIBCXX_DEPRECATED __attribute__ ((__deprecated__)) 2025-05-07T19:46:53.9685281Z #define _GLIBCXX_DEPRECATED_SUGGEST(ALT) __attribute__ ((__deprecated__ ("use '" ALT "' instead"))) 2025-05-07T19:46:53.9685824Z #define _GLIBCXX_DOUBLE_IS_IEEE_BINARY64 1 2025-05-07T19:46:53.9686138Z #define _GLIBCXX_END_EXTERN_C } 2025-05-07T19:46:53.9686436Z #define _GLIBCXX_END_NAMESPACE_ALGO 2025-05-07T19:46:53.9686746Z #define _GLIBCXX_END_NAMESPACE_CONTAINER 2025-05-07T19:46:53.9687078Z #define _GLIBCXX_END_NAMESPACE_CXX11 } 2025-05-07T19:46:53.9687383Z #define _GLIBCXX_END_NAMESPACE_LDBL 2025-05-07T19:46:53.9687801Z #define _GLIBCXX_END_NAMESPACE_LDBL_OR_CXX11 _GLIBCXX_END_NAMESPACE_CXX11 2025-05-07T19:46:53.9688351Z #define _GLIBCXX_END_NAMESPACE_VERSION 2025-05-07T19:46:53.9688655Z #define _GLIBCXX_EXTERN_TEMPLATE 1 2025-05-07T19:46:53.9688944Z #define _GLIBCXX_FAST_MATH 0 2025-05-07T19:46:53.9689215Z #define _GLIBCXX_FLOAT_IS_IEEE_BINARY32 1 2025-05-07T19:46:53.9689614Z #define _GLIBCXX_FORWARD(_Tp,__val) std::forward<_Tp>(__val) 2025-05-07T19:46:53.9689978Z #define _GLIBCXX_FULLY_DYNAMIC_STRING 0 2025-05-07T19:46:53.9690293Z #define _GLIBCXX_FWDREF(_Tp) _Tp&& 2025-05-07T19:46:53.9690570Z #define _GLIBCXX_HAS_GTHREADS 1 2025-05-07T19:46:53.9691459Z #define _GLIBCXX_HAS_NESTED_TYPE(_NTYPE) template> struct __has_##_NTYPE : false_type { }; template struct __has_##_NTYPE<_Tp, __void_t> : true_type { }; 2025-05-07T19:46:53.9692372Z #define _GLIBCXX_HAVE_ACOSF 1 2025-05-07T19:46:53.9692637Z #define _GLIBCXX_HAVE_ACOSL 1 2025-05-07T19:46:53.9692920Z #define _GLIBCXX_HAVE_ALIGNED_ALLOC 1 2025-05-07T19:46:53.9693216Z #define _GLIBCXX_HAVE_ARPA_INET_H 1 2025-05-07T19:46:53.9693514Z #define _GLIBCXX_HAVE_ASINF 1 2025-05-07T19:46:53.9693775Z #define _GLIBCXX_HAVE_ASINL 1 2025-05-07T19:46:53.9694067Z #define _GLIBCXX_HAVE_AS_SYMVER_DIRECTIVE 1 2025-05-07T19:46:53.9694372Z #define _GLIBCXX_HAVE_ATAN2F 1 2025-05-07T19:46:53.9694645Z #define _GLIBCXX_HAVE_ATAN2L 1 2025-05-07T19:46:53.9694919Z #define _GLIBCXX_HAVE_ATANF 1 2025-05-07T19:46:53.9695275Z #define _GLIBCXX_HAVE_ATANL 1 2025-05-07T19:46:53.9695735Z #define _GLIBCXX_HAVE_ATOMIC_LOCK_POLICY 1 2025-05-07T19:46:53.9696069Z #define _GLIBCXX_HAVE_ATTRIBUTE_VISIBILITY 1 2025-05-07T19:46:53.9696539Z #define _GLIBCXX_HAVE_AT_QUICK_EXIT 1 2025-05-07T19:46:53.9696863Z #define _GLIBCXX_HAVE_BUILTIN_HAS_UNIQ_OBJ_REP 1 2025-05-07T19:46:53.9697229Z #define _GLIBCXX_HAVE_BUILTIN_IS_AGGREGATE 1 2025-05-07T19:46:53.9697593Z #define _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED 1 2025-05-07T19:46:53.9697962Z #define _GLIBCXX_HAVE_BUILTIN_IS_SAME 1 2025-05-07T19:46:53.9698283Z #define _GLIBCXX_HAVE_BUILTIN_LAUNDER 1 2025-05-07T19:46:53.9698580Z #define _GLIBCXX_HAVE_CEILF 1 2025-05-07T19:46:53.9698859Z #define _GLIBCXX_HAVE_CEILL 1 2025-05-07T19:46:53.9699131Z #define _GLIBCXX_HAVE_COMPLEX_H 1 2025-05-07T19:46:53.9699438Z #define _GLIBCXX_HAVE_COSF 1 2025-05-07T19:46:53.9699703Z #define _GLIBCXX_HAVE_COSHF 1 2025-05-07T19:46:53.9699979Z #define _GLIBCXX_HAVE_COSHL 1 2025-05-07T19:46:53.9700242Z #define _GLIBCXX_HAVE_COSL 1 2025-05-07T19:46:53.9700523Z #define _GLIBCXX_HAVE_DIRENT_H 1 2025-05-07T19:46:53.9700799Z #define _GLIBCXX_HAVE_DLFCN_H 1 2025-05-07T19:46:53.9701080Z #define _GLIBCXX_HAVE_ENDIAN_H 1 2025-05-07T19:46:53.9701412Z #define _GLIBCXX_HAVE_EXCEPTION_PTR_SINCE_GCC46 1 2025-05-07T19:46:53.9701751Z #define _GLIBCXX_HAVE_EXECINFO_H 1 2025-05-07T19:46:53.9702049Z #define _GLIBCXX_HAVE_EXPF 1 2025-05-07T19:46:53.9702373Z #define _GLIBCXX_HAVE_EXPL 1 2025-05-07T19:46:53.9702669Z #define _GLIBCXX_HAVE_FABSF 1 2025-05-07T19:46:53.9702939Z #define _GLIBCXX_HAVE_FABSL 1 2025-05-07T19:46:53.9703225Z #define _GLIBCXX_HAVE_FCNTL_H 1 2025-05-07T19:46:53.9703500Z #define _GLIBCXX_HAVE_FENV_H 1 2025-05-07T19:46:53.9703789Z #define _GLIBCXX_HAVE_FINITE 1 2025-05-07T19:46:53.9704061Z #define _GLIBCXX_HAVE_FINITEF 1 2025-05-07T19:46:53.9704347Z #define _GLIBCXX_HAVE_FINITEL 1 2025-05-07T19:46:53.9704632Z #define _GLIBCXX_HAVE_FLOAT_H 1 2025-05-07T19:46:53.9704899Z #define _GLIBCXX_HAVE_FLOORF 1 2025-05-07T19:46:53.9705186Z #define _GLIBCXX_HAVE_FLOORL 1 2025-05-07T19:46:53.9705455Z #define _GLIBCXX_HAVE_FMODF 1 2025-05-07T19:46:53.9705743Z #define _GLIBCXX_HAVE_FMODL 1 2025-05-07T19:46:53.9706011Z #define _GLIBCXX_HAVE_FREXPF 1 2025-05-07T19:46:53.9706302Z #define _GLIBCXX_HAVE_FREXPL 1 2025-05-07T19:46:53.9706577Z #define _GLIBCXX_HAVE_GETIPINFO 1 2025-05-07T19:46:53.9706880Z #define _GLIBCXX_HAVE_GETS 1 2025-05-07T19:46:53.9707147Z #define _GLIBCXX_HAVE_HYPOT 1 2025-05-07T19:46:53.9707436Z #define _GLIBCXX_HAVE_HYPOTF 1 2025-05-07T19:46:53.9707721Z #define _GLIBCXX_HAVE_HYPOTL 1 2025-05-07T19:46:53.9707994Z #define _GLIBCXX_HAVE_ICONV 1 2025-05-07T19:46:53.9708279Z #define _GLIBCXX_HAVE_INT64_T 1 2025-05-07T19:46:53.9708557Z #define _GLIBCXX_HAVE_INT64_T_LONG 1 2025-05-07T19:46:53.9708875Z #define _GLIBCXX_HAVE_INTTYPES_H 1 2025-05-07T19:46:53.9709158Z #define _GLIBCXX_HAVE_ISINF 1 2025-05-07T19:46:53.9709445Z #define _GLIBCXX_HAVE_ISINFF 1 2025-05-07T19:46:53.9709711Z #define _GLIBCXX_HAVE_ISINFL 1 2025-05-07T19:46:53.9709989Z #define _GLIBCXX_HAVE_ISNAN 1 2025-05-07T19:46:53.9710249Z #define _GLIBCXX_HAVE_ISNANF 1 2025-05-07T19:46:53.9710521Z #define _GLIBCXX_HAVE_ISNANL 1 2025-05-07T19:46:53.9710800Z #define _GLIBCXX_HAVE_ISWBLANK 1 2025-05-07T19:46:53.9711083Z #define _GLIBCXX_HAVE_LC_MESSAGES 1 2025-05-07T19:46:53.9711380Z #define _GLIBCXX_HAVE_LDEXPF 1 2025-05-07T19:46:53.9711644Z #define _GLIBCXX_HAVE_LDEXPL 1 2025-05-07T19:46:53.9711928Z #define _GLIBCXX_HAVE_LIMIT_AS 1 2025-05-07T19:46:53.9712320Z #define _GLIBCXX_HAVE_LIMIT_DATA 1 2025-05-07T19:46:53.9712724Z #define _GLIBCXX_HAVE_LIMIT_FSIZE 1 2025-05-07T19:46:53.9712991Z #define _GLIBCXX_HAVE_LIMIT_RSS 1 2025-05-07T19:46:53.9713270Z #define _GLIBCXX_HAVE_LIMIT_VMEM 0 2025-05-07T19:46:53.9713536Z #define _GLIBCXX_HAVE_LINK 1 2025-05-07T19:46:53.9713801Z #define _GLIBCXX_HAVE_LINUX_FUTEX 1 2025-05-07T19:46:53.9714090Z #define _GLIBCXX_HAVE_LINUX_RANDOM_H 1 2025-05-07T19:46:53.9714372Z #define _GLIBCXX_HAVE_LINUX_TYPES_H 1 2025-05-07T19:46:53.9714659Z #define _GLIBCXX_HAVE_LOCALE_H 1 2025-05-07T19:46:53.9714914Z #define _GLIBCXX_HAVE_LOG10F 1 2025-05-07T19:46:53.9715176Z #define _GLIBCXX_HAVE_LOG10L 1 2025-05-07T19:46:53.9715422Z #define _GLIBCXX_HAVE_LOGF 1 2025-05-07T19:46:53.9715755Z #define _GLIBCXX_HAVE_LOGL 1 2025-05-07T19:46:53.9716002Z #define _GLIBCXX_HAVE_MBSTATE_T 1 2025-05-07T19:46:53.9716280Z #define _GLIBCXX_HAVE_MEMALIGN 1 2025-05-07T19:46:53.9716540Z #define _GLIBCXX_HAVE_MEMORY_H 1 2025-05-07T19:46:53.9716819Z #define _GLIBCXX_HAVE_MODF 1 2025-05-07T19:46:53.9717085Z #define _GLIBCXX_HAVE_MODFF 1 2025-05-07T19:46:53.9717332Z #define _GLIBCXX_HAVE_MODFL 1 2025-05-07T19:46:53.9717596Z #define _GLIBCXX_HAVE_NETDB_H 1 2025-05-07T19:46:53.9717851Z #define _GLIBCXX_HAVE_NETINET_IN_H 1 2025-05-07T19:46:53.9718141Z #define _GLIBCXX_HAVE_NETINET_TCP_H 1 2025-05-07T19:46:53.9718421Z #define _GLIBCXX_HAVE_OBSOLETE_ISINF 1 2025-05-07T19:46:53.9718714Z #define _GLIBCXX_HAVE_OBSOLETE_ISNAN 1 2025-05-07T19:46:53.9718987Z #define _GLIBCXX_HAVE_POLL 1 2025-05-07T19:46:53.9719247Z #define _GLIBCXX_HAVE_POLL_H 1 2025-05-07T19:46:53.9719502Z #define _GLIBCXX_HAVE_POSIX_MEMALIGN 1 2025-05-07T19:46:53.9719800Z #define _GLIBCXX_HAVE_POSIX_SEMAPHORE 1 2025-05-07T19:46:53.9720096Z #define _GLIBCXX_HAVE_POWF 1 2025-05-07T19:46:53.9720341Z #define _GLIBCXX_HAVE_POWL 1 2025-05-07T19:46:53.9720603Z #define _GLIBCXX_HAVE_QUICK_EXIT 1 2025-05-07T19:46:53.9720866Z #define _GLIBCXX_HAVE_READLINK 1 2025-05-07T19:46:53.9721194Z #define _GLIBCXX_HAVE_SETENV 1 2025-05-07T19:46:53.9721590Z #define _GLIBCXX_HAVE_SINCOS 1 2025-05-07T19:46:53.9721853Z #define _GLIBCXX_HAVE_SINCOSF 1 2025-05-07T19:46:53.9722101Z #define _GLIBCXX_HAVE_SINCOSL 1 2025-05-07T19:46:53.9722363Z #define _GLIBCXX_HAVE_SINF 1 2025-05-07T19:46:53.9722610Z #define _GLIBCXX_HAVE_SINHF 1 2025-05-07T19:46:53.9722874Z #define _GLIBCXX_HAVE_SINHL 1 2025-05-07T19:46:53.9723132Z #define _GLIBCXX_HAVE_SINL 1 2025-05-07T19:46:53.9723381Z #define _GLIBCXX_HAVE_SOCKATMARK 1 2025-05-07T19:46:53.9723655Z #define _GLIBCXX_HAVE_SQRTF 1 2025-05-07T19:46:53.9723898Z #define _GLIBCXX_HAVE_SQRTL 1 2025-05-07T19:46:53.9724161Z #define _GLIBCXX_HAVE_STDALIGN_H 1 2025-05-07T19:46:53.9724426Z #define _GLIBCXX_HAVE_STDBOOL_H 1 2025-05-07T19:46:53.9724706Z #define _GLIBCXX_HAVE_STDINT_H 1 2025-05-07T19:46:53.9724959Z #define _GLIBCXX_HAVE_STDLIB_H 1 2025-05-07T19:46:53.9725231Z #define _GLIBCXX_HAVE_STRERROR_L 1 2025-05-07T19:46:53.9725498Z #define _GLIBCXX_HAVE_STRERROR_R 1 2025-05-07T19:46:53.9725776Z #define _GLIBCXX_HAVE_STRINGS_H 1 2025-05-07T19:46:53.9726049Z #define _GLIBCXX_HAVE_STRING_H 1 2025-05-07T19:46:53.9726303Z #define _GLIBCXX_HAVE_STRTOF 1 2025-05-07T19:46:53.9726734Z #define _GLIBCXX_HAVE_STRTOLD 1 2025-05-07T19:46:53.9727018Z #define _GLIBCXX_HAVE_STRUCT_DIRENT_D_TYPE 1 2025-05-07T19:46:53.9727350Z #define _GLIBCXX_HAVE_STRXFRM_L 1 2025-05-07T19:46:53.9727629Z #define _GLIBCXX_HAVE_SYMLINK 1 2025-05-07T19:46:53.9728163Z #define _GLIBCXX_HAVE_SYMVER_SYMBOL_RENAMING_RUNTIME_SUPPORT 1 2025-05-07T19:46:53.9728549Z #define _GLIBCXX_HAVE_SYS_IOCTL_H 1 2025-05-07T19:46:53.9728849Z #define _GLIBCXX_HAVE_SYS_IPC_H 1 2025-05-07T19:46:53.9729146Z #define _GLIBCXX_HAVE_SYS_PARAM_H 1 2025-05-07T19:46:53.9729442Z #define _GLIBCXX_HAVE_SYS_RESOURCE_H 1 2025-05-07T19:46:53.9729759Z #define _GLIBCXX_HAVE_SYS_SEM_H 1 2025-05-07T19:46:53.9730038Z #define _GLIBCXX_HAVE_SYS_SOCKET_H 1 2025-05-07T19:46:53.9730347Z #define _GLIBCXX_HAVE_SYS_STATVFS_H 1 2025-05-07T19:46:53.9730642Z #define _GLIBCXX_HAVE_SYS_STAT_H 1 2025-05-07T19:46:53.9730942Z #define _GLIBCXX_HAVE_SYS_SYSINFO_H 1 2025-05-07T19:46:53.9731237Z #define _GLIBCXX_HAVE_SYS_TIME_H 1 2025-05-07T19:46:53.9731536Z #define _GLIBCXX_HAVE_SYS_TYPES_H 1 2025-05-07T19:46:53.9731826Z #define _GLIBCXX_HAVE_SYS_UIO_H 1 2025-05-07T19:46:53.9732118Z #define _GLIBCXX_HAVE_S_ISREG 1 2025-05-07T19:46:53.9732403Z #define _GLIBCXX_HAVE_TANF 1 2025-05-07T19:46:53.9732670Z #define _GLIBCXX_HAVE_TANHF 1 2025-05-07T19:46:53.9732953Z #define _GLIBCXX_HAVE_TANHL 1 2025-05-07T19:46:53.9733217Z #define _GLIBCXX_HAVE_TANL 1 2025-05-07T19:46:53.9733500Z #define _GLIBCXX_HAVE_TGMATH_H 1 2025-05-07T19:46:53.9733778Z #define _GLIBCXX_HAVE_TLS 1 2025-05-07T19:46:53.9734059Z #define _GLIBCXX_HAVE_TRUNCATE 1 2025-05-07T19:46:53.9734471Z #define _GLIBCXX_HAVE_UNISTD_H 1 2025-05-07T19:46:53.9734767Z #define _GLIBCXX_HAVE_USELOCALE 1 2025-05-07T19:46:53.9735050Z #define _GLIBCXX_HAVE_UTIME_H 1 2025-05-07T19:46:53.9735416Z #define _GLIBCXX_HAVE_VFWSCANF 1 2025-05-07T19:46:53.9735719Z #define _GLIBCXX_HAVE_VSWSCANF 1 2025-05-07T19:46:53.9735999Z #define _GLIBCXX_HAVE_VWSCANF 1 2025-05-07T19:46:53.9736287Z #define _GLIBCXX_HAVE_WCHAR_H 1 2025-05-07T19:46:53.9736559Z #define _GLIBCXX_HAVE_WCSTOF 1 2025-05-07T19:46:53.9736845Z #define _GLIBCXX_HAVE_WCTYPE_H 1 2025-05-07T19:46:53.9737125Z #define _GLIBCXX_HAVE_WRITEV 1 2025-05-07T19:46:53.9737415Z #define _GLIBCXX_HAVE_XLOCALE_H 1 2025-05-07T19:46:53.9737693Z #define _GLIBCXX_HOSTED 1 2025-05-07T19:46:53.9737968Z #define _GLIBCXX_ICONV_CONST 2025-05-07T19:46:53.9738245Z #define _GLIBCXX_INLINE_VERSION 0 2025-05-07T19:46:53.9738551Z #define _GLIBCXX_LT_OBJDIR ".libs/" 2025-05-07T19:46:53.9739071Z #define _GLIBCXX_MAKE_MOVE_IF_NOEXCEPT_ITERATOR(_Iter) std::__make_move_if_noexcept_iterator(_Iter) 2025-05-07T19:46:53.9739716Z #define _GLIBCXX_MAKE_MOVE_ITERATOR(_Iter) std::make_move_iterator(_Iter) 2025-05-07T19:46:53.9740153Z #define _GLIBCXX_MANGLE_SIZE_T m 2025-05-07T19:46:53.9740426Z #define _GLIBCXX_MATH_H 1 2025-05-07T19:46:53.9740791Z #define _GLIBCXX_MOVE(__val) std::move(__val) 2025-05-07T19:46:53.9741179Z #define _GLIBCXX_MOVE3(_Tp,_Up,_Vp) std::move(_Tp, _Up, _Vp) 2025-05-07T19:46:53.9741694Z #define _GLIBCXX_MOVE_BACKWARD3(_Tp,_Up,_Vp) std::move_backward(_Tp, _Up, _Vp) 2025-05-07T19:46:53.9742152Z #define _GLIBCXX_NAMESPACE_CXX11 __cxx11:: 2025-05-07T19:46:53.9742476Z #define _GLIBCXX_NAMESPACE_LDBL 2025-05-07T19:46:53.9742860Z #define _GLIBCXX_NAMESPACE_LDBL_OR_CXX11 _GLIBCXX_NAMESPACE_CXX11 2025-05-07T19:46:53.9743429Z #define _GLIBCXX_NATIVE_THREAD_ID (__gthread_active_p() ? __gthread_self() : (__gthread_t)1) 2025-05-07T19:46:53.9743948Z #define _GLIBCXX_NODISCARD [[__nodiscard__]] 2025-05-07T19:46:53.9744271Z #define _GLIBCXX_NOEXCEPT noexcept 2025-05-07T19:46:53.9744623Z #define _GLIBCXX_NOEXCEPT_IF(...) noexcept(__VA_ARGS__) 2025-05-07T19:46:53.9744993Z #define _GLIBCXX_NOEXCEPT_PARM , bool _NE 2025-05-07T19:46:53.9745343Z #define _GLIBCXX_NOEXCEPT_QUAL noexcept (_NE) 2025-05-07T19:46:53.9745741Z #define _GLIBCXX_NORETURN __attribute__ ((__noreturn__)) 2025-05-07T19:46:53.9746120Z #define _GLIBCXX_NOTHROW _GLIBCXX_USE_NOEXCEPT 2025-05-07T19:46:53.9746557Z #define _GLIBCXX_NO_OBSOLETE_ISINF_ISNAN_DYNAMIC __GLIBC_PREREQ(2,23) 2025-05-07T19:46:53.9746964Z #define _GLIBCXX_NUMERIC_LIMITS 1 2025-05-07T19:46:53.9747259Z #define _GLIBCXX_OS_DEFINES 1 2025-05-07T19:46:53.9747645Z #define _GLIBCXX_PACKAGE_BUGREPORT "" 2025-05-07T19:46:53.9748080Z #define _GLIBCXX_PACKAGE_NAME "package-unused" 2025-05-07T19:46:53.9748466Z #define _GLIBCXX_PACKAGE_STRING "package-unused version-unused" 2025-05-07T19:46:53.9748865Z #define _GLIBCXX_PACKAGE_TARNAME "libstdc++" 2025-05-07T19:46:53.9749179Z #define _GLIBCXX_PACKAGE_URL "" 2025-05-07T19:46:53.9749494Z #define _GLIBCXX_PACKAGE__GLIBCXX_VERSION "version-unused" 2025-05-07T19:46:53.9749858Z #define _GLIBCXX_PREDEFINED_OPS_H 1 2025-05-07T19:46:53.9750137Z #define _GLIBCXX_PSEUDO_VISIBILITY(V) 2025-05-07T19:46:53.9750455Z #define _GLIBCXX_PURE __attribute__ ((__pure__)) 2025-05-07T19:46:53.9750760Z #define _GLIBCXX_RELEASE 11 2025-05-07T19:46:53.9751016Z #define _GLIBCXX_RES_LIMITS 1 2025-05-07T19:46:53.9751261Z #define _GLIBCXX_STDC_HEADERS 1 2025-05-07T19:46:53.9751528Z #define _GLIBCXX_STDIO_EOF -1 2025-05-07T19:46:53.9751781Z #define _GLIBCXX_STDIO_SEEK_CUR 1 2025-05-07T19:46:53.9752060Z #define _GLIBCXX_STDIO_SEEK_END 2 2025-05-07T19:46:53.9752331Z #define _GLIBCXX_STDLIB_H 1 2025-05-07T19:46:53.9752569Z #define _GLIBCXX_STD_A std 2025-05-07T19:46:53.9752818Z #define _GLIBCXX_STD_C std 2025-05-07T19:46:53.9753049Z #define _GLIBCXX_SYMVER 1 2025-05-07T19:46:53.9753293Z #define _GLIBCXX_SYMVER_GNU 1 2025-05-07T19:46:53.9753579Z #define _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(A) 2025-05-07T19:46:53.9753950Z #define _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(A) 2025-05-07T19:46:53.9754342Z #define _GLIBCXX_THROW(_EXC) 2025-05-07T19:46:53.9754646Z #define _GLIBCXX_THROW_OR_ABORT(_EXC) (throw (_EXC)) 2025-05-07T19:46:53.9754994Z #define _GLIBCXX_TR1_BESSEL_FUNCTION_TCC 1 2025-05-07T19:46:53.9755299Z #define _GLIBCXX_TR1_BETA_FUNCTION_TCC 1 2025-05-07T19:46:53.9755610Z #define _GLIBCXX_TR1_ELL_INTEGRAL_TCC 1 2025-05-07T19:46:53.9755898Z #define _GLIBCXX_TR1_EXP_INTEGRAL_TCC 1 2025-05-07T19:46:53.9756192Z #define _GLIBCXX_TR1_GAMMA_TCC 1 2025-05-07T19:46:53.9756465Z #define _GLIBCXX_TR1_HYPERGEOMETRIC_TCC 1 2025-05-07T19:46:53.9756783Z #define _GLIBCXX_TR1_LEGENDRE_FUNCTION_TCC 1 2025-05-07T19:46:53.9757100Z #define _GLIBCXX_TR1_MODIFIED_BESSEL_FUNC_TCC 1 2025-05-07T19:46:53.9757423Z #define _GLIBCXX_TR1_POLY_HERMITE_TCC 1 2025-05-07T19:46:53.9757709Z #define _GLIBCXX_TR1_POLY_LAGUERRE_TCC 1 2025-05-07T19:46:53.9758017Z #define _GLIBCXX_TR1_RIEMANN_ZETA_TCC 1 2025-05-07T19:46:53.9758329Z #define _GLIBCXX_TR1_SPECIAL_FUNCTION_UTIL_H 1 2025-05-07T19:46:53.9758625Z #define _GLIBCXX_TXN_SAFE 2025-05-07T19:46:53.9758883Z #define _GLIBCXX_TXN_SAFE_DYN 2025-05-07T19:46:53.9759133Z #define _GLIBCXX_TYPE_TRAITS 1 2025-05-07T19:46:53.9759574Z #define _GLIBCXX_USE_ALLOCATOR_NEW 1 2025-05-07T19:46:53.9759916Z #define _GLIBCXX_USE_C99 1 2025-05-07T19:46:53.9760242Z #define _GLIBCXX_USE_C99_COMPLEX _GLIBCXX11_USE_C99_COMPLEX 2025-05-07T19:46:53.9760772Z #define _GLIBCXX_USE_C99_COMPLEX_TR1 1 2025-05-07T19:46:53.9761138Z #define _GLIBCXX_USE_C99_CTYPE_TR1 1 2025-05-07T19:46:53.9761445Z #define _GLIBCXX_USE_C99_FENV_TR1 1 2025-05-07T19:46:53.9761741Z #define _GLIBCXX_USE_C99_INTTYPES_TR1 1 2025-05-07T19:46:53.9762080Z #define _GLIBCXX_USE_C99_INTTYPES_WCHAR_T_TR1 1 2025-05-07T19:46:53.9762449Z #define _GLIBCXX_USE_C99_MATH _GLIBCXX11_USE_C99_MATH 2025-05-07T19:46:53.9762804Z #define _GLIBCXX_USE_C99_MATH_TR1 1 2025-05-07T19:46:53.9763093Z #define _GLIBCXX_USE_C99_STDINT_TR1 1 2025-05-07T19:46:53.9763453Z #define _GLIBCXX_USE_C99_STDIO _GLIBCXX11_USE_C99_STDIO 2025-05-07T19:46:53.9763861Z #define _GLIBCXX_USE_C99_STDLIB _GLIBCXX11_USE_C99_STDLIB 2025-05-07T19:46:53.9764280Z #define _GLIBCXX_USE_C99_WCHAR _GLIBCXX11_USE_C99_WCHAR 2025-05-07T19:46:53.9764659Z #define _GLIBCXX_USE_CLOCK_MONOTONIC 1 2025-05-07T19:46:53.9764968Z #define _GLIBCXX_USE_CLOCK_REALTIME 1 2025-05-07T19:46:53.9765295Z #define _GLIBCXX_USE_CONSTEXPR constexpr 2025-05-07T19:46:53.9765604Z #define _GLIBCXX_USE_CXX11_ABI 1 2025-05-07T19:46:53.9765905Z #define _GLIBCXX_USE_DECIMAL_FLOAT 1 2025-05-07T19:46:53.9766204Z #define _GLIBCXX_USE_DEPRECATED 1 2025-05-07T19:46:53.9766505Z #define _GLIBCXX_USE_DEV_RANDOM 1 2025-05-07T19:46:53.9766788Z #define _GLIBCXX_USE_DUAL_ABI 1 2025-05-07T19:46:53.9767080Z #define _GLIBCXX_USE_FCHMOD 1 2025-05-07T19:46:53.9767351Z #define _GLIBCXX_USE_FCHMODAT 1 2025-05-07T19:46:53.9767641Z #define _GLIBCXX_USE_FLOAT128 1 2025-05-07T19:46:53.9767940Z #define _GLIBCXX_USE_GETTIMEOFDAY 1 2025-05-07T19:46:53.9768238Z #define _GLIBCXX_USE_GET_NPROCS 1 2025-05-07T19:46:53.9768537Z #define _GLIBCXX_USE_INT128 1 2025-05-07T19:46:53.9768813Z #define _GLIBCXX_USE_LFS 1 2025-05-07T19:46:53.9769094Z #define _GLIBCXX_USE_LONG_LONG 1 2025-05-07T19:46:53.9769373Z #define _GLIBCXX_USE_LSTAT 1 2025-05-07T19:46:53.9769664Z #define _GLIBCXX_USE_NANOSLEEP 1 2025-05-07T19:46:53.9769952Z #define _GLIBCXX_USE_NOEXCEPT noexcept 2025-05-07T19:46:53.9770278Z #define _GLIBCXX_USE_PTHREAD_RWLOCK_T 1 2025-05-07T19:46:53.9770584Z #define _GLIBCXX_USE_RANDOM_TR1 1 2025-05-07T19:46:53.9770887Z #define _GLIBCXX_USE_REALPATH 1 2025-05-07T19:46:53.9771290Z #define _GLIBCXX_USE_SCHED_YIELD 1 2025-05-07T19:46:53.9771587Z #define _GLIBCXX_USE_SC_NPROCESSORS_ONLN 1 2025-05-07T19:46:53.9771917Z #define _GLIBCXX_USE_SENDFILE 1 2025-05-07T19:46:53.9772187Z #define _GLIBCXX_USE_STD_SPEC_FUNCS 1 2025-05-07T19:46:53.9772581Z #define _GLIBCXX_USE_ST_MTIM 1 2025-05-07T19:46:53.9772902Z #define _GLIBCXX_USE_TBB_PAR_BACKEND __has_include() 2025-05-07T19:46:53.9773266Z #define _GLIBCXX_USE_TMPNAM 1 2025-05-07T19:46:53.9773515Z #define _GLIBCXX_USE_UTIME 1 2025-05-07T19:46:53.9773849Z #define _GLIBCXX_USE_UTIMENSAT 1 2025-05-07T19:46:53.9774121Z #define _GLIBCXX_USE_WCHAR_T 1 2025-05-07T19:46:53.9774383Z #define _GLIBCXX_USE_WEAK_REF __GXX_WEAK__ 2025-05-07T19:46:53.9774846Z #define _GLIBCXX_UTILITY 1 2025-05-07T19:46:53.9775315Z #define _GLIBCXX_VERBOSE 1 2025-05-07T19:46:53.9775808Z #define _GLIBCXX_VISIBILITY(V) __attribute__ ((__visibility__ (#V))) 2025-05-07T19:46:53.9776214Z #define _GLIBCXX_WEAK_DEFINITION 2025-05-07T19:46:53.9776511Z #define _GLIBCXX_X86_RDRAND 1 2025-05-07T19:46:53.9776775Z #define _GLIBCXX_X86_RDSEED 1 2025-05-07T19:46:53.9777046Z #define _GNU_SOURCE 1 2025-05-07T19:46:53.9777297Z #define _GTHREAD_USE_MUTEX_TIMEDLOCK 1 2025-05-07T19:46:53.9777603Z #define _G_BUFSIZ 8192 2025-05-07T19:46:53.9777847Z #define _G_HAVE_MMAP 1 2025-05-07T19:46:53.9778079Z #define _G_HAVE_MREMAP 1 2025-05-07T19:46:53.9778399Z #define _G_HAVE_ST_BLKSIZE defined (_STATBUF_ST_BLKSIZE) 2025-05-07T19:46:53.9778757Z #define _G_IO_IO_FILE_VERSION 0x20001 2025-05-07T19:46:53.9779057Z #define _G_config_h 1 2025-05-07T19:46:53.9779298Z #define _G_va_list __gnuc_va_list 2025-05-07T19:46:53.9779588Z #define _INITIALIZER_LIST 2025-05-07T19:46:53.9779832Z #define _IOFBF 0 2025-05-07T19:46:53.9780170Z #define _IOLBF 1 2025-05-07T19:46:53.9780387Z #define _IONBF 2 2025-05-07T19:46:53.9780630Z #define _IOS_APPEND 8 2025-05-07T19:46:53.9780867Z #define _IOS_ATEND 4 2025-05-07T19:46:53.9781110Z #define _IOS_BIN 128 2025-05-07T19:46:53.9781353Z #define _IOS_INPUT 1 2025-05-07T19:46:53.9781587Z #define _IOS_NOCREATE 32 2025-05-07T19:46:53.9781852Z #define _IOS_NOREPLACE 64 2025-05-07T19:46:53.9782095Z #define _IOS_OUTPUT 2 2025-05-07T19:46:53.9782338Z #define _IOS_TRUNC 16 2025-05-07T19:46:53.9782570Z #define _IO_BAD_SEEN 0x4000 2025-05-07T19:46:53.9782897Z #define _IO_BE(expr,res) __builtin_expect ((expr), res) 2025-05-07T19:46:53.9783246Z #define _IO_BOOLALPHA 0200000 2025-05-07T19:46:53.9783528Z #define _IO_BUFSIZ _G_BUFSIZ 2025-05-07T19:46:53.9783795Z #define _IO_CURRENTLY_PUTTING 0x800 2025-05-07T19:46:53.9784089Z #define _IO_DEC 020 2025-05-07T19:46:53.9784333Z #define _IO_DELETE_DONT_CLOSE 0x40 2025-05-07T19:46:53.9784614Z #define _IO_DONT_CLOSE 0100000 2025-05-07T19:46:53.9784893Z #define _IO_EOF_SEEN 0x10 2025-05-07T19:46:53.9785136Z #define _IO_ERR_SEEN 0x20 2025-05-07T19:46:53.9785392Z #define _IO_FIXED 010000 2025-05-07T19:46:53.9785634Z #define _IO_FLAGS2_MMAP 1 2025-05-07T19:46:53.9785898Z #define _IO_FLAGS2_NOTCANCEL 2 2025-05-07T19:46:53.9786168Z #define _IO_FLAGS2_USER_WBUF 8 2025-05-07T19:46:53.9786469Z #define _IO_HAVE_ST_BLKSIZE _G_HAVE_ST_BLKSIZE 2025-05-07T19:46:53.9786777Z #define _IO_HEX 0100 2025-05-07T19:46:53.9787018Z #define _IO_INTERNAL 010 2025-05-07T19:46:53.9787277Z #define _IO_IN_BACKUP 0x100 2025-05-07T19:46:53.9787536Z #define _IO_IS_APPENDING 0x1000 2025-05-07T19:46:53.9787818Z #define _IO_IS_FILEBUF 0x2000 2025-05-07T19:46:53.9788072Z #define _IO_LEFT 02 2025-05-07T19:46:53.9788307Z #define _IO_LINE_BUF 0x200 2025-05-07T19:46:53.9788556Z #define _IO_LINKED 0x80 2025-05-07T19:46:53.9788808Z #define _IO_MAGIC 0xFBAD0000 2025-05-07T19:46:53.9789072Z #define _IO_MAGIC_MASK 0xFFFF0000 2025-05-07T19:46:53.9789353Z #define _IO_NO_READS 4 2025-05-07T19:46:53.9789588Z #define _IO_NO_WRITES 8 2025-05-07T19:46:53.9789833Z #define _IO_OCT 040 2025-05-07T19:46:53.9790209Z #define _IO_PENDING_OUTPUT_COUNT(_fp) ((_fp)->_IO_write_ptr - (_fp)->_IO_write_base) 2025-05-07T19:46:53.9790666Z #define _IO_RIGHT 04 2025-05-07T19:46:53.9790916Z #define _IO_SCIENTIFIC 04000 2025-05-07T19:46:53.9791181Z #define _IO_SHOWBASE 0200 2025-05-07T19:46:53.9791444Z #define _IO_SHOWPOINT 0400 2025-05-07T19:46:53.9791694Z #define _IO_SHOWPOS 02000 2025-05-07T19:46:53.9791949Z #define _IO_SKIPWS 01 2025-05-07T19:46:53.9792179Z #define _IO_STDIO 040000 2025-05-07T19:46:53.9792432Z #define _IO_STDIO_H 2025-05-07T19:46:53.9792673Z #define _IO_TIED_PUT_GET 0x400 2025-05-07T19:46:53.9792947Z #define _IO_UNBUFFERED 2 2025-05-07T19:46:53.9793200Z #define _IO_UNIFIED_JUMPTABLES 1 2025-05-07T19:46:53.9793602Z #define _IO_UNITBUF 020000 2025-05-07T19:46:53.9793874Z #define _IO_UPPERCASE 01000 2025-05-07T19:46:53.9794129Z #define _IO_USER_BUF 1 2025-05-07T19:46:53.9794387Z #define _IO_USER_LOCK 0x8000 2025-05-07T19:46:53.9879119Z #define _IO_cleanup_region_end(_Doit) 2025-05-07T19:46:53.9879463Z #define _IO_cleanup_region_start(_fct,_fp) 2025-05-07T19:46:53.9879868Z #define _IO_feof_unlocked(__fp) (((__fp)->_flags & _IO_EOF_SEEN) != 0) 2025-05-07T19:46:53.9880363Z #define _IO_ferror_unlocked(__fp) (((__fp)->_flags & _IO_ERR_SEEN) != 0) 2025-05-07T19:46:53.9880765Z #define _IO_file_flags _flags 2025-05-07T19:46:53.9881020Z #define _IO_flockfile(_fp) 2025-05-07T19:46:53.9881283Z #define _IO_fpos64_t _G_fpos64_t 2025-05-07T19:46:53.9881556Z #define _IO_fpos_t _G_fpos_t 2025-05-07T19:46:53.9881825Z #define _IO_ftrylockfile(_fp) 2025-05-07T19:46:53.9882095Z #define _IO_funlockfile(_fp) 2025-05-07T19:46:53.9882654Z #define _IO_getc_unlocked(_fp) (_IO_BE ((_fp)->_IO_read_ptr >= (_fp)->_IO_read_end, 0) ? __uflow (_fp) : *(unsigned char *) (_fp)->_IO_read_ptr++) 2025-05-07T19:46:53.9883251Z #define _IO_iconv_t _G_iconv_t 2025-05-07T19:46:53.9883513Z #define _IO_off64_t __off64_t 2025-05-07T19:46:53.9883980Z #define _IO_off_t __off_t 2025-05-07T19:46:53.9884267Z #define _IO_peekc(_fp) _IO_peekc_unlocked (_fp) 2025-05-07T19:46:53.9884922Z #define _IO_peekc_unlocked(_fp) (_IO_BE ((_fp)->_IO_read_ptr >= (_fp)->_IO_read_end, 0) && __underflow (_fp) == EOF ? EOF : *(unsigned char *) (_fp)->_IO_read_ptr) 2025-05-07T19:46:53.9885538Z #define _IO_pid_t __pid_t 2025-05-07T19:46:53.9886188Z #define _IO_putc_unlocked(_ch,_fp) (_IO_BE ((_fp)->_IO_write_ptr >= (_fp)->_IO_write_end, 0) ? __overflow (_fp, (unsigned char) (_ch)) : (unsigned char) (*(_fp)->_IO_write_ptr++ = (_ch))) 2025-05-07T19:46:53.9886881Z #define _IO_size_t size_t 2025-05-07T19:46:53.9887123Z #define _IO_ssize_t __ssize_t 2025-05-07T19:46:53.9887430Z #define _IO_stderr ((_IO_FILE*)(&_IO_2_1_stderr_)) 2025-05-07T19:46:53.9887783Z #define _IO_stdin ((_IO_FILE*)(&_IO_2_1_stdin_)) 2025-05-07T19:46:53.9888151Z #define _IO_stdout ((_IO_FILE*)(&_IO_2_1_stdout_)) 2025-05-07T19:46:53.9888471Z #define _IO_uid_t __uid_t 2025-05-07T19:46:53.9888736Z #define _IO_va_list __gnuc_va_list 2025-05-07T19:46:53.9889011Z #define _IO_wint_t wint_t 2025-05-07T19:46:53.9889269Z #define _ISOC11_SOURCE 1 2025-05-07T19:46:53.9889507Z #define _ISOC95_SOURCE 1 2025-05-07T19:46:53.9889755Z #define _ISOC99_SOURCE 1 2025-05-07T19:46:53.9890096Z #define _ISbit(bit) ((bit) < 8 ? ((1 << (bit)) << 8) : ((1 << (bit)) >> 8)) 2025-05-07T19:46:53.9890483Z #define _LARGEFILE64_SOURCE 1 2025-05-07T19:46:53.9890864Z #define _LARGEFILE_SOURCE 1 2025-05-07T19:46:53.9891212Z #define _LIBC_LIMITS_H_ 1 2025-05-07T19:46:53.9891452Z #define _LINUX_LIMITS_H 2025-05-07T19:46:53.9891670Z #define _LP64 1 2025-05-07T19:46:53.9891880Z #define _MATH_H 1 2025-05-07T19:46:53.9892084Z #define _MATH_H_MATHDEF 1 2025-05-07T19:46:53.9892318Z #define _MOVE_H 1 2025-05-07T19:46:53.9892521Z #define _Mfloat_ float 2025-05-07T19:46:53.9892772Z #define _Mlong_double_ long double 2025-05-07T19:46:53.9893030Z #define _NEW 2025-05-07T19:46:53.9893234Z #define _OLD_STDIO_MAGIC 0xFABC0000 2025-05-07T19:46:53.9893512Z #define _POSIX2_BC_BASE_MAX 99 2025-05-07T19:46:53.9893765Z #define _POSIX2_BC_DIM_MAX 2048 2025-05-07T19:46:53.9894029Z #define _POSIX2_BC_SCALE_MAX 99 2025-05-07T19:46:53.9894281Z #define _POSIX2_BC_STRING_MAX 1000 2025-05-07T19:46:53.9894568Z #define _POSIX2_CHARCLASS_NAME_MAX 14 2025-05-07T19:46:53.9894846Z #define _POSIX2_COLL_WEIGHTS_MAX 2 2025-05-07T19:46:53.9895120Z #define _POSIX2_EXPR_NEST_MAX 32 2025-05-07T19:46:53.9895473Z #define _POSIX2_LINE_MAX 2048 2025-05-07T19:46:53.9895915Z #define _POSIX2_RE_DUP_MAX 255 2025-05-07T19:46:53.9896198Z #define _POSIX_AIO_LISTIO_MAX 2 2025-05-07T19:46:53.9896461Z #define _POSIX_AIO_MAX 1 2025-05-07T19:46:53.9896769Z #define _POSIX_ARG_MAX 4096 2025-05-07T19:46:53.9897024Z #define _POSIX_CHILD_MAX 25 2025-05-07T19:46:53.9897342Z #define _POSIX_CLOCKRES_MIN 20000000 2025-05-07T19:46:53.9897759Z #define _POSIX_C_SOURCE 200809L 2025-05-07T19:46:53.9898044Z #define _POSIX_DELAYTIMER_MAX 32 2025-05-07T19:46:53.9898336Z #define _POSIX_FD_SETSIZE _POSIX_OPEN_MAX 2025-05-07T19:46:53.9898671Z #define _POSIX_HIWAT _POSIX_PIPE_BUF 2025-05-07T19:46:53.9898964Z #define _POSIX_HOST_NAME_MAX 255 2025-05-07T19:46:53.9899248Z #define _POSIX_LINK_MAX 8 2025-05-07T19:46:53.9899514Z #define _POSIX_LOGIN_NAME_MAX 9 2025-05-07T19:46:53.9899786Z #define _POSIX_MAX_CANON 255 2025-05-07T19:46:53.9900061Z #define _POSIX_MAX_INPUT 255 2025-05-07T19:46:53.9900317Z #define _POSIX_MQ_OPEN_MAX 8 2025-05-07T19:46:53.9900595Z #define _POSIX_MQ_PRIO_MAX 32 2025-05-07T19:46:53.9900855Z #define _POSIX_NAME_MAX 14 2025-05-07T19:46:53.9901121Z #define _POSIX_NGROUPS_MAX 8 2025-05-07T19:46:53.9901379Z #define _POSIX_OPEN_MAX 20 2025-05-07T19:46:53.9901640Z #define _POSIX_PATH_MAX 256 2025-05-07T19:46:53.9901890Z #define _POSIX_PIPE_BUF 512 2025-05-07T19:46:53.9902150Z #define _POSIX_QLIMIT 1 2025-05-07T19:46:53.9902399Z #define _POSIX_RE_DUP_MAX 255 2025-05-07T19:46:53.9902672Z #define _POSIX_RTSIG_MAX 8 2025-05-07T19:46:53.9902940Z #define _POSIX_SEM_NSEMS_MAX 256 2025-05-07T19:46:53.9903215Z #define _POSIX_SEM_VALUE_MAX 32767 2025-05-07T19:46:53.9903596Z #define _POSIX_SIGQUEUE_MAX 32 2025-05-07T19:46:53.9903859Z #define _POSIX_SOURCE 1 2025-05-07T19:46:53.9904118Z #define _POSIX_SSIZE_MAX 32767 2025-05-07T19:46:53.9904381Z #define _POSIX_STREAM_MAX 8 2025-05-07T19:46:53.9904654Z #define _POSIX_SYMLINK_MAX 255 2025-05-07T19:46:53.9904918Z #define _POSIX_SYMLOOP_MAX 8 2025-05-07T19:46:53.9905225Z #define _POSIX_THREAD_DESTRUCTOR_ITERATIONS 4 2025-05-07T19:46:53.9905553Z #define _POSIX_THREAD_KEYS_MAX 128 2025-05-07T19:46:53.9905853Z #define _POSIX_THREAD_THREADS_MAX 64 2025-05-07T19:46:53.9906154Z #define _POSIX_TIMER_MAX 32 2025-05-07T19:46:53.9906411Z #define _POSIX_TTY_NAME_MAX 9 2025-05-07T19:46:53.9906687Z #define _POSIX_TZNAME_MAX 6 2025-05-07T19:46:53.9906943Z #define _POSIX_UIO_MAXIOV 16 2025-05-07T19:46:53.9907296Z #define _PSTL_ASSERT(_Condition) __glibcxx_assert(_Condition) 2025-05-07T19:46:53.9907795Z #define _PSTL_ASSERT_MSG(_Condition,_Message) __glibcxx_assert(_Condition) 2025-05-07T19:46:53.9908418Z #define _PSTL_CLANG_VERSION (__clang_major__ * 10000 + __clang_minor__ * 100 + __clang_patchlevel__) 2025-05-07T19:46:53.9908908Z #define _PSTL_CONFIG_H 2025-05-07T19:46:53.9909382Z #define _PSTL_CPP11_STD_ROTATE_BROKEN ((__GLIBCXX__ && __GLIBCXX__ < 20150716) || (_MSC_VER && _MSC_VER < 1800)) 2025-05-07T19:46:53.9910257Z #define _PSTL_CPP14_2RANGE_MISMATCH_EQUAL_PRESENT (_MSC_VER >= 1900 || __cplusplus >= 201300L || __cpp_lib_robust_nonmodifying_seq_ops == 201304) 2025-05-07T19:46:53.9911069Z #define _PSTL_CPP14_INTEGER_SEQUENCE_PRESENT (_MSC_VER >= 1900 || __cplusplus >= 201402L) 2025-05-07T19:46:53.9911869Z #define _PSTL_CPP14_MAKE_REVERSE_ITERATOR_PRESENT (_MSC_VER >= 1900 || __cplusplus >= 201402L || __cpp_lib_make_reverse_iterator == 201402) 2025-05-07T19:46:53.9912862Z #define _PSTL_CPP14_VARIABLE_TEMPLATES_PRESENT (!__INTEL_COMPILER || __INTEL_COMPILER >= 1700) && (_MSC_FULL_VER >= 190023918 || __cplusplus >= 201402L) 2025-05-07T19:46:53.9913618Z #define _PSTL_CPP17_EXECUTION_POLICIES_PRESENT (_MSC_VER >= 1912) 2025-05-07T19:46:53.9914088Z #define _PSTL_EARLYEXIT_PRESENT (__INTEL_COMPILER >= 1800) 2025-05-07T19:46:53.9914593Z #define _PSTL_GCC_VERSION (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__) 2025-05-07T19:46:53.9915057Z #define _PSTL_HIDE_FROM_ABI_POP 2025-05-07T19:46:53.9915343Z #define _PSTL_HIDE_FROM_ABI_PUSH 2025-05-07T19:46:53.9915712Z #define _PSTL_ICC_18_OMP_SIMD_BROKEN (__INTEL_COMPILER == 1800) 2025-05-07T19:46:53.9916164Z #define _PSTL_MONOTONIC_PRESENT (__INTEL_COMPILER >= 1800) 2025-05-07T19:46:53.9916529Z #define _PSTL_PAR_BACKEND_SERIAL 2025-05-07T19:46:53.9916831Z #define _PSTL_PRAGMA(x) _Pragma(# x) 2025-05-07T19:46:53.9917485Z #define _PSTL_PRAGMA_DECLARE_REDUCTION(NAME,OP) _PSTL_PRAGMA(omp declare reduction(NAME:OP : omp_out(omp_in)) initializer(omp_priv = omp_orig)) 2025-05-07T19:46:53.9918321Z #define _PSTL_PRAGMA_DECLARE_SIMD _PSTL_PRAGMA(omp declare simd) 2025-05-07T19:46:53.9918716Z #define _PSTL_PRAGMA_FORCEINLINE 2025-05-07T19:46:53.9919082Z #define _PSTL_PRAGMA_LOCATION " [Parallel STL message]: " 2025-05-07T19:46:53.9919444Z #define _PSTL_PRAGMA_MESSAGE(x) 2025-05-07T19:46:53.9919966Z #define _PSTL_PRAGMA_MESSAGE_IMPL(x) _PSTL_PRAGMA(message(_PSTL_STRING_CONCAT(_PSTL_PRAGMA_LOCATION, x))) 2025-05-07T19:46:53.9920532Z #define _PSTL_PRAGMA_MESSAGE_POLICIES(x) 2025-05-07T19:46:53.9920876Z #define _PSTL_PRAGMA_SIMD _PSTL_PRAGMA(omp simd) 2025-05-07T19:46:53.9921226Z #define _PSTL_PRAGMA_SIMD_EARLYEXIT 2025-05-07T19:46:53.9921763Z #define _PSTL_PRAGMA_SIMD_EXCLUSIVE_SCAN(PRM) 2025-05-07T19:46:53.9922099Z #define _PSTL_PRAGMA_SIMD_INCLUSIVE_SCAN(PRM) 2025-05-07T19:46:53.9922432Z #define _PSTL_PRAGMA_SIMD_ORDERED_MONOTONIC(PRM) 2025-05-07T19:46:53.9922825Z #define _PSTL_PRAGMA_SIMD_ORDERED_MONOTONIC_2ARGS(PRM1,PRM2) 2025-05-07T19:46:53.9923312Z #define _PSTL_PRAGMA_SIMD_REDUCTION(PRM) _PSTL_PRAGMA(omp simd reduction(PRM)) 2025-05-07T19:46:53.9923725Z #define _PSTL_PRAGMA_SIMD_SCAN(PRM) 2025-05-07T19:46:53.9924015Z #define _PSTL_PRAGMA_VECTOR_UNALIGNED 2025-05-07T19:46:53.9924366Z #define _PSTL_STRING(x) _PSTL_STRING_AUX(x) 2025-05-07T19:46:53.9924668Z #define _PSTL_STRING_AUX(x) #x 2025-05-07T19:46:53.9924924Z #define _PSTL_STRING_CONCAT(x,y) x #y 2025-05-07T19:46:53.9925200Z #define _PSTL_UDR_PRESENT 0 2025-05-07T19:46:53.9925605Z #define _PSTL_UDS_PRESENT (__INTEL_COMPILER >= 1900 && __INTEL_COMPILER_BUILD_DATE >= 20180626) 2025-05-07T19:46:53.9926069Z #define _PSTL_USAGE_WARNINGS 0 2025-05-07T19:46:53.9926365Z #define _PSTL_USE_NONTEMPORAL_STORES_IF_ALLOWED 2025-05-07T19:46:53.9926672Z #define _PSTL_VERSION 12000 2025-05-07T19:46:53.9926964Z #define _PSTL_VERSION_MAJOR (_PSTL_VERSION / 1000) 2025-05-07T19:46:53.9927325Z #define _PSTL_VERSION_MINOR ((_PSTL_VERSION % 1000) / 10) 2025-05-07T19:46:53.9927696Z #define _PSTL_VERSION_PATCH (_PSTL_VERSION % 10) 2025-05-07T19:46:53.9927995Z #define _PTRDIFF_T 2025-05-07T19:46:53.9928209Z #define _PTR_TRAITS_H 1 2025-05-07T19:46:53.9928434Z #define _SIGSET_H_types 1 2025-05-07T19:46:53.9928757Z #define _SIGSET_NWORDS (1024 / (8 * sizeof (unsigned long int))) 2025-05-07T19:46:53.9929098Z #define _SIZE_T 2025-05-07T19:46:53.9929311Z #define _STDC_PREDEF_H 1 2025-05-07T19:46:53.9929544Z #define _STDIO_H 1 2025-05-07T19:46:53.9929754Z #define _STDIO_USES_IOSTREAM 2025-05-07T19:46:53.9930006Z #define _STDLIB_H 1 2025-05-07T19:46:53.9930211Z #define _STL_ALGOBASE_H 1 2025-05-07T19:46:53.9930464Z #define _STL_ITERATOR_BASE_FUNCS_H 1 2025-05-07T19:46:53.9930739Z #define _STL_ITERATOR_BASE_TYPES_H 1 2025-05-07T19:46:53.9931018Z #define _STL_ITERATOR_H 1 2025-05-07T19:46:53.9931243Z #define _STL_PAIR_H 1 2025-05-07T19:46:53.9931471Z #define _STL_RELOPS_H 1 2025-05-07T19:46:53.9931685Z #define _STRING_H 1 2025-05-07T19:46:53.9931909Z #define _STRUCT_TIMEVAL 1 2025-05-07T19:46:53.9932145Z #define _SVID_SOURCE 1 2025-05-07T19:46:53.9932363Z #define _SYS_CDEFS_H 1 2025-05-07T19:46:53.9932593Z #define _SYS_SELECT_H 1 2025-05-07T19:46:53.9933071Z #define _SYS_SYSMACROS_H 1 2025-05-07T19:46:53.9933317Z #define _SYS_TYPES_H 1 2025-05-07T19:46:53.9933531Z #define _TIME_H 1 2025-05-07T19:46:53.9933752Z #define _VA_LIST_DEFINED 2025-05-07T19:46:53.9933977Z #define _XLOCALE_H 1 2025-05-07T19:46:53.9934220Z #define _XOPEN_IOV_MAX _POSIX_UIO_MAXIOV 2025-05-07T19:46:53.9934495Z #define _XOPEN_LIM_H 1 2025-05-07T19:46:53.9934729Z #define _XOPEN_SOURCE 700 2025-05-07T19:46:53.9934978Z #define _XOPEN_SOURCE_EXTENDED 1 2025-05-07T19:46:53.9935396Z #define __ASMNAME(cname) __ASMNAME2 (__USER_LABEL_PREFIX__, cname) 2025-05-07T19:46:53.9936007Z #define __ASMNAME2(prefix,cname) __STRING (prefix) cname 2025-05-07T19:46:53.9936390Z #define __ASSERT_FUNCTION __PRETTY_FUNCTION__ 2025-05-07T19:46:53.9936736Z #define __ASSERT_VOID_CAST static_cast 2025-05-07T19:46:53.9937054Z #define __ATOMIC_ACQUIRE 2 2025-05-07T19:46:53.9937301Z #define __ATOMIC_ACQ_REL 4 2025-05-07T19:46:53.9937643Z #define __ATOMIC_CONSUME 1 2025-05-07T19:46:53.9937888Z #define __ATOMIC_RELAXED 0 2025-05-07T19:46:53.9938145Z #define __ATOMIC_RELEASE 3 2025-05-07T19:46:53.9938409Z #define __ATOMIC_SEQ_CST 5 2025-05-07T19:46:53.9938669Z #define __BEGIN_DECLS extern "C" { 2025-05-07T19:46:53.9938972Z #define __BEGIN_NAMESPACE_C99 2025-05-07T19:46:53.9939244Z #define __BEGIN_NAMESPACE_STD 2025-05-07T19:46:53.9939524Z #define __BIGGEST_ALIGNMENT__ 16 2025-05-07T19:46:53.9939793Z #define __BIG_ENDIAN 4321 2025-05-07T19:46:53.9940063Z #define __BITINT_MAXWIDTH__ 8388608 2025-05-07T19:46:53.9940348Z #define __BIT_TYPES_DEFINED__ 1 2025-05-07T19:46:53.9940639Z #define __BLKCNT64_T_TYPE __SQUAD_TYPE 2025-05-07T19:46:53.9940951Z #define __BLKCNT_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:53.9941294Z #define __BLKSIZE_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:53.9941615Z #define __BOOL_WIDTH__ 8 2025-05-07T19:46:53.9941866Z #define __BYTE_ORDER __LITTLE_ENDIAN 2025-05-07T19:46:53.9942192Z #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__ 2025-05-07T19:46:53.9942511Z #define __CHANNEL_DESCRIPTOR_H__ 2025-05-07T19:46:53.9942808Z #define __CHAR16_TYPE__ unsigned short 2025-05-07T19:46:53.9943162Z #define __CHAR32_TYPE__ unsigned int 2025-05-07T19:46:53.9943453Z #define __CHAR_BIT__ 8 2025-05-07T19:46:53.9943701Z #define __CLANG_ATOMIC_BOOL_LOCK_FREE 2 2025-05-07T19:46:53.9944038Z #define __CLANG_ATOMIC_CHAR16_T_LOCK_FREE 2 2025-05-07T19:46:53.9944359Z #define __CLANG_ATOMIC_CHAR32_T_LOCK_FREE 2 2025-05-07T19:46:53.9944687Z #define __CLANG_ATOMIC_CHAR_LOCK_FREE 2 2025-05-07T19:46:53.9945008Z #define __CLANG_ATOMIC_INT_LOCK_FREE 2 2025-05-07T19:46:53.9945312Z #define __CLANG_ATOMIC_LLONG_LOCK_FREE 2 2025-05-07T19:46:53.9945622Z #define __CLANG_ATOMIC_LONG_LOCK_FREE 2 2025-05-07T19:46:53.9945934Z #define __CLANG_ATOMIC_POINTER_LOCK_FREE 2 2025-05-07T19:46:53.9946244Z #define __CLANG_ATOMIC_SHORT_LOCK_FREE 2 2025-05-07T19:46:53.9946560Z #define __CLANG_ATOMIC_WCHAR_T_LOCK_FREE 2 2025-05-07T19:46:53.9946851Z #define __CLANG_LIMITS_H 2025-05-07T19:46:53.9947113Z #define __CLANG_MAX_ALIGN_T_DEFINED 2025-05-07T19:46:53.9947401Z #define __CLOCKID_T_TYPE __S32_TYPE 2025-05-07T19:46:53.9947696Z #define __CLOCK_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:53.9948004Z #define __COMMON_FUNCTIONS_H__ 2025-05-07T19:46:53.9948259Z #define __COMPAR_FN_T 2025-05-07T19:46:53.9948500Z #define __CONCAT(x,y) x ## y 2025-05-07T19:46:53.9948753Z #define __CONSTANT_CFSTRINGS__ 1 2025-05-07T19:46:53.9949045Z #define __CUDACC_DEVICE_ATOMIC_BUILTINS__ 1 2025-05-07T19:46:53.9949342Z #define __CUDACC_VER_BUILD__ 61 2025-05-07T19:46:53.9949616Z #define __CUDACC_VER_MAJOR__ 12 2025-05-07T19:46:53.9949874Z #define __CUDACC_VER_MINOR__ 8 2025-05-07T19:46:53.9950484Z #define __CUDACC_VER__ "__CUDACC_VER__ is no longer supported. Use __CUDACC_VER_MAJOR__, __CUDACC_VER_MINOR__, and __CUDACC_VER_BUILD__ instead." 2025-05-07T19:46:53.9951111Z #define __CUDACC__ 1 2025-05-07T19:46:53.9951449Z #define __CUDART_API_PTDS(api) api 2025-05-07T19:46:53.9951739Z #define __CUDART_API_PTSZ(api) api 2025-05-07T19:46:53.9952172Z #define __CUDART_API_VERSION ((__CUDA_API_VER_MAJOR__ * 1000) + (__CUDA_API_VER_MINOR__ * 10)) 2025-05-07T19:46:53.9952641Z #define __CUDA_API_VER_MAJOR__ 12 2025-05-07T19:46:53.9952902Z #define __CUDA_API_VER_MINOR__ 8 2025-05-07T19:46:53.9953240Z #define __CUDA_ARCH_HAS_FEATURE__(_FEAT) __CUDA_ARCH_FEAT_##_FEAT 2025-05-07T19:46:53.9953618Z #define __CUDA_ARCH_LIST__ 520 2025-05-07T19:46:53.9953863Z #define __CUDA_ARCH__ 520 2025-05-07T19:46:53.9954113Z #define __CUDA_DEVICE_RUNTIME_API_H__ 2025-05-07T19:46:53.9954386Z #define __CUDA_MATH_CRTIMP 2025-05-07T19:46:53.9954635Z #define __CUDA_RUNTIME_API_H__ 2025-05-07T19:46:53.9954879Z #define __CUDA_RUNTIME_H__ 2025-05-07T19:46:53.9955127Z #define __DADDR_T_TYPE __S32_TYPE 2025-05-07T19:46:53.9955383Z #define __DBL_DECIMAL_DIG__ 17 2025-05-07T19:46:53.9955660Z #define __DBL_DENORM_MIN__ 4.9406564584124654e-324 2025-05-07T19:46:53.9955953Z #define __DBL_DIG__ 15 2025-05-07T19:46:53.9956269Z #define __DBL_EPSILON__ 2.2204460492503131e-16 2025-05-07T19:46:53.9956569Z #define __DBL_HAS_DENORM__ 1 2025-05-07T19:46:53.9956813Z #define __DBL_HAS_INFINITY__ 1 2025-05-07T19:46:53.9957067Z #define __DBL_HAS_QUIET_NAN__ 1 2025-05-07T19:46:53.9957312Z #define __DBL_MANT_DIG__ 53 2025-05-07T19:46:53.9957554Z #define __DBL_MAX_10_EXP__ 308 2025-05-07T19:46:53.9957795Z #define __DBL_MAX_EXP__ 1024 2025-05-07T19:46:53.9958058Z #define __DBL_MAX__ 1.7976931348623157e+308 2025-05-07T19:46:53.9958333Z #define __DBL_MIN_10_EXP__ (-307) 2025-05-07T19:46:53.9958600Z #define __DBL_MIN_EXP__ (-1021) 2025-05-07T19:46:53.9958855Z #define __DBL_MIN__ 2.2250738585072014e-308 2025-05-07T19:46:53.9959162Z #define __DECIMAL_DIG__ __LDBL_DECIMAL_DIG__ 2025-05-07T19:46:53.9959464Z #define __DELETE_THROW throw() 2025-05-07T19:46:53.9959705Z #define __DEPRECATED 1 2025-05-07T19:46:53.9959948Z #define __DEVICE_ATOMIC_FUNCTIONS_HPP__ 2025-05-07T19:46:53.9960242Z #define __DEVICE_ATOMIC_FUNCTIONS_H__ 2025-05-07T19:46:53.9960542Z #define __DEVICE_DOUBLE_FUNCTIONS_HPP__ 2025-05-07T19:46:53.9960831Z #define __DEVICE_DOUBLE_FUNCTIONS_H__ 2025-05-07T19:46:53.9961116Z #define __DEVICE_FUNCTIONS_HPP__ 2025-05-07T19:46:53.9961455Z #define __DEVICE_FUNCTIONS_H__ 2025-05-07T19:46:53.9961735Z #define __DEVICE_LAUNCH_PARAMETERS_H__ 2025-05-07T19:46:53.9962027Z #define __DEVICE_TYPES_H__ 2025-05-07T19:46:53.9962269Z #define __DEV_T_TYPE __UQUAD_TYPE 2025-05-07T19:46:53.9962541Z #define __DRIVER_FUNCTIONS_H__ 2025-05-07T19:46:53.9962791Z #define __DRIVER_TYPES_H__ 2025-05-07T19:46:53.9963030Z #define __ELF__ 1 2025-05-07T19:46:53.9963228Z #define __END_DECLS } 2025-05-07T19:46:53.9963461Z #define __END_NAMESPACE_C99 2025-05-07T19:46:53.9963705Z #define __END_NAMESPACE_STD 2025-05-07T19:46:53.9963954Z #define __EXCEPTIONS 1 2025-05-07T19:46:53.9964171Z #define __EXCEPTION_H 1 2025-05-07T19:46:53.9964416Z #define __FDS_BITS(set) ((set)->fds_bits) 2025-05-07T19:46:53.9964819Z #define __FD_CLR(d,set) ((void) (__FDS_BITS (set)[__FD_ELT (d)] &= ~__FD_MASK (d))) 2025-05-07T19:46:53.9965221Z #define __FD_ELT(d) ((d) / __NFDBITS) 2025-05-07T19:46:53.9965604Z #define __FD_ISSET(d,set) ((__FDS_BITS (set)[__FD_ELT (d)] & __FD_MASK (d)) != 0) 2025-05-07T19:46:53.9966049Z #define __FD_MASK(d) ((__fd_mask) 1 << ((d) % __NFDBITS)) 2025-05-07T19:46:53.9966487Z #define __FD_SET(d,set) ((void) (__FDS_BITS (set)[__FD_ELT (d)] |= __FD_MASK (d))) 2025-05-07T19:46:53.9967043Z #define __FD_SETSIZE 1024 2025-05-07T19:46:53.9967729Z #define __FD_ZERO(fdsp) do { int __d0, __d1; __asm__ __volatile__ ("cld; rep; " __FD_ZERO_STOS : "=c" (__d0), "=D" (__d1) : "a" (0), "0" (sizeof (fd_set) / sizeof (__fd_mask)), "1" (&__FDS_BITS (fdsp)[0]) : "memory"); } while (0) 2025-05-07T19:46:53.9968465Z #define __FD_ZERO_STOS "stosq" 2025-05-07T19:46:53.9968719Z #define __FILE_defined 1 2025-05-07T19:46:53.9968969Z #define __FINITE_MATH_ONLY__ 0 2025-05-07T19:46:53.9969217Z #define __FLOAT128__ 1 2025-05-07T19:46:53.9969465Z #define __FLOAT_WORD_ORDER __BYTE_ORDER 2025-05-07T19:46:53.9969756Z #define __FLT16_DECIMAL_DIG__ 5 2025-05-07T19:46:53.9970054Z #define __FLT16_DENORM_MIN__ 5.9604644775390625e-8F16 2025-05-07T19:46:53.9970368Z #define __FLT16_DIG__ 3 2025-05-07T19:46:53.9970617Z #define __FLT16_EPSILON__ 9.765625e-4F16 2025-05-07T19:46:53.9970909Z #define __FLT16_HAS_DENORM__ 1 2025-05-07T19:46:53.9971166Z #define __FLT16_HAS_INFINITY__ 1 2025-05-07T19:46:53.9971437Z #define __FLT16_HAS_QUIET_NAN__ 1 2025-05-07T19:46:53.9971702Z #define __FLT16_MANT_DIG__ 11 2025-05-07T19:46:53.9971957Z #define __FLT16_MAX_10_EXP__ 4 2025-05-07T19:46:53.9972208Z #define __FLT16_MAX_EXP__ 16 2025-05-07T19:46:53.9972463Z #define __FLT16_MAX__ 6.5504e+4F16 2025-05-07T19:46:53.9972730Z #define __FLT16_MIN_10_EXP__ (-4) 2025-05-07T19:46:53.9973002Z #define __FLT16_MIN_EXP__ (-13) 2025-05-07T19:46:53.9973262Z #define __FLT16_MIN__ 6.103515625e-5F16 2025-05-07T19:46:53.9973549Z #define __FLT_DECIMAL_DIG__ 9 2025-05-07T19:46:53.9973820Z #define __FLT_DENORM_MIN__ 1.40129846e-45F 2025-05-07T19:46:53.9974164Z #define __FLT_DIG__ 6 2025-05-07T19:46:53.9974401Z #define __FLT_EPSILON__ 1.19209290e-7F 2025-05-07T19:46:53.9974838Z #define __FLT_HAS_DENORM__ 1 2025-05-07T19:46:53.9975106Z #define __FLT_HAS_INFINITY__ 1 2025-05-07T19:46:53.9975422Z #define __FLT_HAS_QUIET_NAN__ 1 2025-05-07T19:46:53.9975690Z #define __FLT_MANT_DIG__ 24 2025-05-07T19:46:53.9975932Z #define __FLT_MAX_10_EXP__ 38 2025-05-07T19:46:53.9976193Z #define __FLT_MAX_EXP__ 128 2025-05-07T19:46:53.9976434Z #define __FLT_MAX__ 3.40282347e+38F 2025-05-07T19:46:53.9976721Z #define __FLT_MIN_10_EXP__ (-37) 2025-05-07T19:46:53.9977011Z #define __FLT_MIN_EXP__ (-125) 2025-05-07T19:46:53.9977277Z #define __FLT_MIN__ 1.17549435e-38F 2025-05-07T19:46:53.9977564Z #define __FLT_RADIX__ 2 2025-05-07T19:46:53.9977821Z #define __FSBLKCNT64_T_TYPE __UQUAD_TYPE 2025-05-07T19:46:53.9978165Z #define __FSBLKCNT_T_TYPE __SYSCALL_ULONG_TYPE 2025-05-07T19:46:53.9978496Z #define __FSFILCNT64_T_TYPE __UQUAD_TYPE 2025-05-07T19:46:53.9978839Z #define __FSFILCNT_T_TYPE __SYSCALL_ULONG_TYPE 2025-05-07T19:46:53.9979186Z #define __FSID_T_TYPE struct { int __val[2]; } 2025-05-07T19:46:53.9979543Z #define __FSWORD_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:53.9979957Z #define __FXSR__ 1 2025-05-07T19:46:53.9980210Z #define __GCC_ASM_FLAG_OUTPUTS__ 1 2025-05-07T19:46:53.9980521Z #define __GCC_ATOMIC_BOOL_LOCK_FREE 2 2025-05-07T19:46:53.9980827Z #define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 2 2025-05-07T19:46:53.9981157Z #define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2 2025-05-07T19:46:53.9981467Z #define __GCC_ATOMIC_CHAR_LOCK_FREE 2 2025-05-07T19:46:53.9981779Z #define __GCC_ATOMIC_INT_LOCK_FREE 2 2025-05-07T19:46:53.9982080Z #define __GCC_ATOMIC_LLONG_LOCK_FREE 2 2025-05-07T19:46:53.9982398Z #define __GCC_ATOMIC_LONG_LOCK_FREE 2 2025-05-07T19:46:53.9982704Z #define __GCC_ATOMIC_POINTER_LOCK_FREE 2 2025-05-07T19:46:53.9983033Z #define __GCC_ATOMIC_SHORT_LOCK_FREE 2 2025-05-07T19:46:53.9983365Z #define __GCC_ATOMIC_TEST_AND_SET_TRUEVAL 1 2025-05-07T19:46:53.9983690Z #define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2 2025-05-07T19:46:53.9984004Z #define __GCC_HAVE_DWARF2_CFI_ASM 1 2025-05-07T19:46:53.9984305Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1 2025-05-07T19:46:53.9984650Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1 2025-05-07T19:46:53.9984974Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1 2025-05-07T19:46:53.9985310Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1 2025-05-07T19:46:53.9985617Z #define __GID_T_TYPE __U32_TYPE 2025-05-07T19:46:53.9985903Z #define __GLIBCXX_BITSIZE_INT_N_0 128 2025-05-07T19:46:53.9986199Z #define __GLIBCXX_TYPE_INT_N_0 __int128 2025-05-07T19:46:53.9986501Z #define __GLIBCXX__ 20230528 2025-05-07T19:46:53.9986777Z #define __GLIBC_HAVE_LONG_LONG 1 2025-05-07T19:46:53.9987045Z #define __GLIBC_MINOR__ 17 2025-05-07T19:46:53.9987461Z #define __GLIBC_PREREQ(maj,min) ((__GLIBC__ << 16) + __GLIBC_MINOR__ >= ((maj) << 16) + (min)) 2025-05-07T19:46:53.9988013Z #define __GLIBC__ 2 2025-05-07T19:46:53.9988251Z #define __GNUC_GNU_INLINE__ 1 2025-05-07T19:46:53.9988510Z #define __GNUC_MINOR__ 2 2025-05-07T19:46:53.9988770Z #define __GNUC_PATCHLEVEL__ 1 2025-05-07T19:46:53.9989160Z #define __GNUC_PREREQ(maj,min) ((__GNUC__ << 16) + __GNUC_MINOR__ >= ((maj) << 16) + (min)) 2025-05-07T19:46:53.9989600Z #define __GNUC_VA_LIST 2025-05-07T19:46:53.9989838Z #define __GNUC__ 4 2025-05-07T19:46:53.9990046Z #define __GNUG__ 4 2025-05-07T19:46:53.9990272Z #define __GNU_LIBRARY__ 6 2025-05-07T19:46:53.9990516Z #define __GXX_ABI_VERSION 1002 2025-05-07T19:46:53.9990796Z #define __GXX_EXPERIMENTAL_CXX0X__ 1 2025-05-07T19:46:53.9991071Z #define __GXX_RTTI 1 2025-05-07T19:46:53.9991304Z #define __GXX_WEAK__ 1 2025-05-07T19:46:53.9991631Z #define __HAVE_COLUMN 2025-05-07T19:46:53.9991861Z #define __HOST_CONFIG_H__ 2025-05-07T19:46:53.9992095Z #define __HOST_DEFINES_H__ 2025-05-07T19:46:53.9992347Z #define __ID_T_TYPE __U32_TYPE 2025-05-07T19:46:53.9992608Z #define __INO64_T_TYPE __UQUAD_TYPE 2025-05-07T19:46:53.9992873Z #define __INO_T_MATCHES_INO64_T 1 2025-05-07T19:46:53.9993252Z #define __INO_T_TYPE __SYSCALL_ULONG_TYPE 2025-05-07T19:46:53.9993532Z #define __INT16_C_SUFFIX__ 2025-05-07T19:46:53.9993781Z #define __INT16_FMTd__ "hd" 2025-05-07T19:46:53.9994015Z #define __INT16_FMTi__ "hi" 2025-05-07T19:46:53.9994265Z #define __INT16_MAX__ 32767 2025-05-07T19:46:53.9994499Z #define __INT16_TYPE__ short 2025-05-07T19:46:53.9994753Z #define __INT32_C_SUFFIX__ 2025-05-07T19:46:53.9994985Z #define __INT32_FMTd__ "d" 2025-05-07T19:46:53.9995228Z #define __INT32_FMTi__ "i" 2025-05-07T19:46:53.9995459Z #define __INT32_MAX__ 2147483647 2025-05-07T19:46:53.9995722Z #define __INT32_TYPE__ int 2025-05-07T19:46:53.9995965Z #define __INT64_C_SUFFIX__ L 2025-05-07T19:46:53.9996200Z #define __INT64_FMTd__ "ld" 2025-05-07T19:46:53.9996445Z #define __INT64_FMTi__ "li" 2025-05-07T19:46:53.9996687Z #define __INT64_MAX__ 9223372036854775807L 2025-05-07T19:46:53.9996974Z #define __INT64_TYPE__ long int 2025-05-07T19:46:53.9997216Z #define __INT8_C_SUFFIX__ 2025-05-07T19:46:53.9997458Z #define __INT8_FMTd__ "hhd" 2025-05-07T19:46:53.9997691Z #define __INT8_FMTi__ "hhi" 2025-05-07T19:46:53.9997934Z #define __INT8_MAX__ 127 2025-05-07T19:46:53.9998165Z #define __INT8_TYPE__ signed char 2025-05-07T19:46:53.9999697Z #define __INTMAX_C_SUFFIX__ L 2025-05-07T19:46:53.9999976Z #define __INTMAX_FMTd__ "ld" 2025-05-07T19:46:54.0000226Z #define __INTMAX_FMTi__ "li" 2025-05-07T19:46:54.0000526Z #define __INTMAX_MAX__ 9223372036854775807L 2025-05-07T19:46:54.0000991Z #define __INTMAX_TYPE__ long int 2025-05-07T19:46:54.0001274Z #define __INTMAX_WIDTH__ 64 2025-05-07T19:46:54.0001528Z #define __INTPTR_FMTd__ "ld" 2025-05-07T19:46:54.0001803Z #define __INTPTR_FMTi__ "li" 2025-05-07T19:46:54.0002072Z #define __INTPTR_MAX__ 9223372036854775807L 2025-05-07T19:46:54.0002388Z #define __INTPTR_TYPE__ long int 2025-05-07T19:46:54.0002660Z #define __INTPTR_WIDTH__ 64 2025-05-07T19:46:54.0002934Z #define __INT_FAST16_FMTd__ "hd" 2025-05-07T19:46:54.0003033Z #define __INT_FAST16_FMTi__ "hi" 2025-05-07T19:46:54.0003149Z #define __INT_FAST16_MAX__ 32767 2025-05-07T19:46:54.0003252Z #define __INT_FAST16_TYPE__ short 2025-05-07T19:46:54.0003348Z #define __INT_FAST16_WIDTH__ 16 2025-05-07T19:46:54.0003445Z #define __INT_FAST32_FMTd__ "d" 2025-05-07T19:46:54.0003559Z #define __INT_FAST32_FMTi__ "i" 2025-05-07T19:46:54.0003660Z #define __INT_FAST32_MAX__ 2147483647 2025-05-07T19:46:54.0003755Z #define __INT_FAST32_TYPE__ int 2025-05-07T19:46:54.0003862Z #define __INT_FAST32_WIDTH__ 32 2025-05-07T19:46:54.0003956Z #define __INT_FAST64_FMTd__ "ld" 2025-05-07T19:46:54.0004051Z #define __INT_FAST64_FMTi__ "li" 2025-05-07T19:46:54.0004171Z #define __INT_FAST64_MAX__ 9223372036854775807L 2025-05-07T19:46:54.0004286Z #define __INT_FAST64_TYPE__ long int 2025-05-07T19:46:54.0004380Z #define __INT_FAST64_WIDTH__ 64 2025-05-07T19:46:54.0004475Z #define __INT_FAST8_FMTd__ "hhd" 2025-05-07T19:46:54.0004584Z #define __INT_FAST8_FMTi__ "hhi" 2025-05-07T19:46:54.0004678Z #define __INT_FAST8_MAX__ 127 2025-05-07T19:46:54.0004785Z #define __INT_FAST8_TYPE__ signed char 2025-05-07T19:46:54.0004885Z #define __INT_FAST8_WIDTH__ 8 2025-05-07T19:46:54.0004997Z #define __INT_LEAST16_FMTd__ "hd" 2025-05-07T19:46:54.0005098Z #define __INT_LEAST16_FMTi__ "hi" 2025-05-07T19:46:54.0005193Z #define __INT_LEAST16_MAX__ 32767 2025-05-07T19:46:54.0005298Z #define __INT_LEAST16_TYPE__ short 2025-05-07T19:46:54.0005393Z #define __INT_LEAST16_WIDTH__ 16 2025-05-07T19:46:54.0005484Z #define __INT_LEAST32_FMTd__ "d" 2025-05-07T19:46:54.0005589Z #define __INT_LEAST32_FMTi__ "i" 2025-05-07T19:46:54.0005689Z #define __INT_LEAST32_MAX__ 2147483647 2025-05-07T19:46:54.0005785Z #define __INT_LEAST32_TYPE__ int 2025-05-07T19:46:54.0005878Z #define __INT_LEAST32_WIDTH__ 32 2025-05-07T19:46:54.0005984Z #define __INT_LEAST64_FMTd__ "ld" 2025-05-07T19:46:54.0006080Z #define __INT_LEAST64_FMTi__ "li" 2025-05-07T19:46:54.0006198Z #define __INT_LEAST64_MAX__ 9223372036854775807L 2025-05-07T19:46:54.0006305Z #define __INT_LEAST64_TYPE__ long int 2025-05-07T19:46:54.0006396Z #define __INT_LEAST64_WIDTH__ 64 2025-05-07T19:46:54.0006558Z #define __INT_LEAST8_FMTd__ "hhd" 2025-05-07T19:46:54.0006651Z #define __INT_LEAST8_FMTi__ "hhi" 2025-05-07T19:46:54.0006759Z #define __INT_LEAST8_MAX__ 127 2025-05-07T19:46:54.0006867Z #define __INT_LEAST8_TYPE__ signed char 2025-05-07T19:46:54.0006961Z #define __INT_LEAST8_WIDTH__ 8 2025-05-07T19:46:54.0007061Z #define __INT_MAX__ 2147483647 2025-05-07T19:46:54.0007151Z #define __INT_WIDTH__ 32 2025-05-07T19:46:54.0007248Z #define __KERNEL_STRICT_NAMES 2025-05-07T19:46:54.0007343Z #define __KEY_T_TYPE __S32_TYPE 2025-05-07T19:46:54.0007447Z #define __LDBL_DECIMAL_DIG__ 21 2025-05-07T19:46:54.0007592Z #define __LDBL_DENORM_MIN__ 3.64519953188247460253e-4951L 2025-05-07T19:46:54.0007683Z #define __LDBL_DIG__ 18 2025-05-07T19:46:54.0007817Z #define __LDBL_EPSILON__ 1.08420217248550443401e-19L 2025-05-07T19:46:54.0007911Z #define __LDBL_HAS_DENORM__ 1 2025-05-07T19:46:54.0008012Z #define __LDBL_HAS_INFINITY__ 1 2025-05-07T19:46:54.0008108Z #define __LDBL_HAS_QUIET_NAN__ 1 2025-05-07T19:46:54.0008219Z #define __LDBL_MANT_DIG__ 64 2025-05-07T19:46:54.0008315Z #define __LDBL_MAX_10_EXP__ 4932 2025-05-07T19:46:54.0008411Z #define __LDBL_MAX_EXP__ 16384 2025-05-07T19:46:54.0008593Z #define __LDBL_MAX__ 1.18973149535723176502e+4932L 2025-05-07T19:46:54.0008691Z #define __LDBL_MIN_10_EXP__ (-4931) 2025-05-07T19:46:54.0008790Z #define __LDBL_MIN_EXP__ (-16381) 2025-05-07T19:46:54.0008907Z #define __LDBL_MIN__ 3.36210314311209350626e-4932L 2025-05-07T19:46:54.0009035Z #define __LDBL_REDIR(name,proto) name proto 2025-05-07T19:46:54.0009171Z #define __LDBL_REDIR1(name,proto,alias) name proto 2025-05-07T19:46:54.0009346Z #define __LDBL_REDIR1_NTH(name,proto,alias) name proto __THROW 2025-05-07T19:46:54.0009459Z #define __LDBL_REDIR_DECL(name) 2025-05-07T19:46:54.0009604Z #define __LDBL_REDIR_NTH(name,proto) name proto __THROW 2025-05-07T19:46:54.0009689Z #define __LEAF 2025-05-07T19:46:54.0009785Z #define __LEAF_ATTR 2025-05-07T19:46:54.0009881Z #define __LIBRARY_TYPES_H__ 2025-05-07T19:46:54.0009978Z #define __LITTLE_ENDIAN 1234 2025-05-07T19:46:54.0010069Z #define __LITTLE_ENDIAN__ 1 2025-05-07T19:46:54.0010174Z #define __LLONG_WIDTH__ 64 2025-05-07T19:46:54.0010292Z #define __LONG_LONG_MAX__ 9223372036854775807LL 2025-05-07T19:46:54.0010396Z #define __LONG_LONG_PAIR(HI,LO) LO, HI 2025-05-07T19:46:54.0010510Z #define __LONG_MAX__ 9223372036854775807L 2025-05-07T19:46:54.0010597Z #define __LONG_WIDTH__ 64 2025-05-07T19:46:54.0010680Z #define __LP64__ 1 2025-05-07T19:46:54.0011012Z #define __MATHCALLX(function,suffix,args,attrib) __MATHDECLX (_Mdouble_,function,suffix, args, attrib) 2025-05-07T19:46:54.0011773Z #define __MATHDECLX(type,function,suffix,args,attrib) __MATHDECL_1(type, function,suffix, args) __attribute__ (attrib); __MATHDECL_1(type, __CONCAT(__,function),suffix, args) __attribute__ (attrib) 2025-05-07T19:46:54.0011870Z #define __MATH_DECLARE_LDOUBLE 1 2025-05-07T19:46:54.0011963Z #define __MATH_FUNCTIONS_HPP__ 2025-05-07T19:46:54.0012069Z #define __MATH_FUNCTIONS_H__ 2025-05-07T19:46:54.0012149Z #define __MMX__ 1 2025-05-07T19:46:54.0012239Z #define __MODE_T_TYPE __U32_TYPE 2025-05-07T19:46:54.0012326Z #define __N(msgid) (msgid) 2025-05-07T19:46:54.0012564Z #define __NFDBITS (8 * (int) sizeof (__fd_mask)) 2025-05-07T19:46:54.0012665Z #define __NLINK_T_TYPE __SYSCALL_ULONG_TYPE 2025-05-07T19:46:54.0012740Z #define __NO_CTYPE 1 2025-05-07T19:46:54.0012833Z #define __NO_INLINE__ 1 2025-05-07T19:46:54.0012918Z #define __NO_MATH_INLINES 1 2025-05-07T19:46:54.0013016Z #define __NTH(fct) __LEAF_ATTR fct throw () 2025-05-07T19:46:54.0013122Z #define __NVCC_DIAG_PRAGMA_SUPPORT__ 1 2025-05-07T19:46:54.0013200Z #define __NVCC__ 1 2025-05-07T19:46:54.0013287Z #define __NV_GLIBCXX_VERSION 40800 2025-05-07T19:46:54.0013370Z #define __NV_LEGACY_LAUNCH 1 2025-05-07T19:46:54.0013477Z #define __NV_NO_HOST_COMPILER_CHECK 1 2025-05-07T19:46:54.0013562Z #define __OBJC_BOOL_IS_BOOL 0 2025-05-07T19:46:54.0013650Z #define __OFF64_T_TYPE __SQUAD_TYPE 2025-05-07T19:46:54.0013915Z #define __OFF_T_MATCHES_OFF64_T 1 2025-05-07T19:46:54.0014089Z #define __OFF_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:54.0014210Z #define __OPENCL_MEMORY_SCOPE_ALL_SVM_DEVICES 3 2025-05-07T19:46:54.0014314Z #define __OPENCL_MEMORY_SCOPE_DEVICE 2 2025-05-07T19:46:54.0014437Z #define __OPENCL_MEMORY_SCOPE_SUB_GROUP 4 2025-05-07T19:46:54.0014547Z #define __OPENCL_MEMORY_SCOPE_WORK_GROUP 1 2025-05-07T19:46:54.0014648Z #define __OPENCL_MEMORY_SCOPE_WORK_ITEM 0 2025-05-07T19:46:54.0014754Z #define __ORDER_BIG_ENDIAN__ 4321 2025-05-07T19:46:54.0014849Z #define __ORDER_LITTLE_ENDIAN__ 1234 2025-05-07T19:46:54.0014941Z #define __ORDER_PDP_ENDIAN__ 3412 2025-05-07T19:46:54.0015027Z #define __P(args) args 2025-05-07T19:46:54.0015125Z #define __PDP_ENDIAN 3412 2025-05-07T19:46:54.0015282Z #define __PIC__ 2 2025-05-07T19:46:54.0015377Z #define __PID_T_TYPE __S32_TYPE 2025-05-07T19:46:54.0015462Z #define __PIE__ 2 2025-05-07T19:46:54.0015549Z #define __PMT(args) args 2025-05-07T19:46:54.0015808Z #define __POINTER_WIDTH__ 64 2025-05-07T19:46:54.0015914Z #define __PRAGMA_REDEFINE_EXTNAME 1 2025-05-07T19:46:54.0016023Z #define __PTHREAD_MUTEX_HAVE_PREV 1 2025-05-07T19:46:54.0016137Z #define __PTHREAD_RWLOCK_INT_FLAGS_SHARED 1 2025-05-07T19:46:54.0016288Z #define __PTHREAD_SPINS 0, 0 2025-05-07T19:46:54.0016390Z #define __PTRDIFF_FMTd__ "ld" 2025-05-07T19:46:54.0016480Z #define __PTRDIFF_FMTi__ "li" 2025-05-07T19:46:54.0016586Z #define __PTRDIFF_MAX__ 9223372036854775807L 2025-05-07T19:46:54.0016683Z #define __PTRDIFF_TYPE__ long int 2025-05-07T19:46:54.0016785Z #define __PTRDIFF_WIDTH__ 64 2025-05-07T19:46:54.0017006Z #define __REDIRECT(name,proto,alias) name proto __asm__ (__ASMNAME (#alias)) 2025-05-07T19:46:54.0017223Z #define __REDIRECT_LDBL(name,proto,alias) __REDIRECT (name, proto, alias) 2025-05-07T19:46:54.0017485Z #define __REDIRECT_NTH(name,proto,alias) name proto __THROW __asm__ (__ASMNAME (#alias)) 2025-05-07T19:46:54.0017757Z #define __REDIRECT_NTHNL(name,proto,alias) name proto __THROWNL __asm__ (__ASMNAME (#alias)) 2025-05-07T19:46:54.0018004Z #define __REDIRECT_NTH_LDBL(name,proto,alias) __REDIRECT_NTH (name, proto, alias) 2025-05-07T19:46:54.0018109Z #define __REGISTER_PREFIX__ 2025-05-07T19:46:54.0018208Z #define __RLIM64_T_TYPE __UQUAD_TYPE 2025-05-07T19:46:54.0018320Z #define __RLIM_T_TYPE __SYSCALL_ULONG_TYPE 2025-05-07T19:46:54.0018412Z #define __S16_TYPE short int 2025-05-07T19:46:54.0018507Z #define __S32_TYPE int 2025-05-07T19:46:54.0018599Z #define __S64_TYPE long int 2025-05-07T19:46:54.0018688Z #define __SCHAR_MAX__ 127 2025-05-07T19:46:54.0018781Z #define __SEG_FS 1 2025-05-07T19:46:54.0018860Z #define __SEG_GS 1 2025-05-07T19:46:54.0018951Z #define __SHRT_MAX__ 32767 2025-05-07T19:46:54.0019038Z #define __SHRT_WIDTH__ 16 2025-05-07T19:46:54.0019146Z #define __SIG_ATOMIC_MAX__ 2147483647 2025-05-07T19:46:54.0019238Z #define __SIG_ATOMIC_WIDTH__ 32 2025-05-07T19:46:54.0019328Z #define __SIZEOF_DOUBLE__ 8 2025-05-07T19:46:54.0019433Z #define __SIZEOF_FLOAT128__ 16 2025-05-07T19:46:54.0019521Z #define __SIZEOF_FLOAT__ 4 2025-05-07T19:46:54.0019617Z #define __SIZEOF_INT128__ 16 2025-05-07T19:46:54.0019706Z #define __SIZEOF_INT__ 4 2025-05-07T19:46:54.0019813Z #define __SIZEOF_LONG_DOUBLE__ 16 2025-05-07T19:46:54.0019905Z #define __SIZEOF_LONG_LONG__ 8 2025-05-07T19:46:54.0019996Z #define __SIZEOF_LONG__ 8 2025-05-07T19:46:54.0020099Z #define __SIZEOF_POINTER__ 8 2025-05-07T19:46:54.0020195Z #define __SIZEOF_PTHREAD_ATTR_T 56 2025-05-07T19:46:54.0020305Z #define __SIZEOF_PTHREAD_BARRIERATTR_T 4 2025-05-07T19:46:54.0020406Z #define __SIZEOF_PTHREAD_BARRIER_T 32 2025-05-07T19:46:54.0020517Z #define __SIZEOF_PTHREAD_CONDATTR_T 4 2025-05-07T19:46:54.0020614Z #define __SIZEOF_PTHREAD_COND_T 48 2025-05-07T19:46:54.0020715Z #define __SIZEOF_PTHREAD_MUTEXATTR_T 4 2025-05-07T19:46:54.0020824Z #define __SIZEOF_PTHREAD_MUTEX_T 40 2025-05-07T19:46:54.0020927Z #define __SIZEOF_PTHREAD_RWLOCKATTR_T 8 2025-05-07T19:46:54.0021027Z #define __SIZEOF_PTHREAD_RWLOCK_T 56 2025-05-07T19:46:54.0021118Z #define __SIZEOF_PTRDIFF_T__ 8 2025-05-07T19:46:54.0021280Z #define __SIZEOF_SHORT__ 2 2025-05-07T19:46:54.0021367Z #define __SIZEOF_SIZE_T__ 8 2025-05-07T19:46:54.0021459Z #define __SIZEOF_WCHAR_T__ 4 2025-05-07T19:46:54.0021560Z #define __SIZEOF_WINT_T__ 4 2025-05-07T19:46:54.0021653Z #define __SIZE_FMTX__ "lX" 2025-05-07T19:46:54.0021742Z #define __SIZE_FMTo__ "lo" 2025-05-07T19:46:54.0021835Z #define __SIZE_FMTu__ "lu" 2025-05-07T19:46:54.0021935Z #define __SIZE_FMTx__ "lx" 2025-05-07T19:46:54.0022035Z #define __SIZE_MAX__ 18446744073709551615UL 2025-05-07T19:46:54.0022137Z #define __SIZE_TYPE__ long unsigned int 2025-05-07T19:46:54.0022235Z #define __SIZE_WIDTH__ 64 2025-05-07T19:46:54.0022325Z #define __SLONG32_TYPE int 2025-05-07T19:46:54.0022421Z #define __SLONGWORD_TYPE long int 2025-05-07T19:46:54.0022510Z #define __SM_100_RT_HPP__ 2025-05-07T19:46:54.0022605Z #define __SM_100_RT_H__ 2025-05-07T19:46:54.0022709Z #define __SM_20_ATOMIC_FUNCTIONS_HPP__ 2025-05-07T19:46:54.0022807Z #define __SM_20_ATOMIC_FUNCTIONS_H__ 2025-05-07T19:46:54.0022922Z #define __SM_20_INTRINSICS_HPP__ 2025-05-07T19:46:54.0023014Z #define __SM_20_INTRINSICS_H__ 2025-05-07T19:46:54.0023109Z #define __SM_30_INTRINSICS_HPP__ 2025-05-07T19:46:54.0023199Z #define __SM_30_INTRINSICS_H__ 2025-05-07T19:46:54.0023364Z #define __SM_32_ATOMIC_FUNCTIONS_HPP__ 2025-05-07T19:46:54.0023466Z #define __SM_32_ATOMIC_FUNCTIONS_H__ 2025-05-07T19:46:54.0023559Z #define __SM_32_INTRINSICS_HPP__ 2025-05-07T19:46:54.0023663Z #define __SM_32_INTRINSICS_H__ 2025-05-07T19:46:54.0023763Z #define __SM_35_ATOMIC_FUNCTIONS_H__ 2025-05-07T19:46:54.0023852Z #define __SM_35_INTRINSICS_H__ 2025-05-07T19:46:54.0023964Z #define __SM_60_ATOMIC_FUNCTIONS_HPP__ 2025-05-07T19:46:54.0024061Z #define __SM_60_ATOMIC_FUNCTIONS_H__ 2025-05-07T19:46:54.0024157Z #define __SM_61_INTRINSICS_HPP__ 2025-05-07T19:46:54.0024253Z #define __SM_61_INTRINSICS_H__ 2025-05-07T19:46:54.0024353Z #define __SM_70_RT_HPP__ 2025-05-07T19:46:54.0024438Z #define __SM_70_RT_H__ 2025-05-07T19:46:54.0024527Z #define __SM_80_RT_HPP__ 2025-05-07T19:46:54.0024623Z #define __SM_80_RT_H__ 2025-05-07T19:46:54.0024709Z #define __SM_90_RT_HPP__ 2025-05-07T19:46:54.0024794Z #define __SM_90_RT_H__ 2025-05-07T19:46:54.0024887Z #define __SQUAD_TYPE long int 2025-05-07T19:46:54.0024985Z #define __SSE2_MATH__ 1 2025-05-07T19:46:54.0025064Z #define __SSE2__ 1 2025-05-07T19:46:54.0025147Z #define __SSE_MATH__ 1 2025-05-07T19:46:54.0025227Z #define __SSE__ 1 2025-05-07T19:46:54.0025335Z #define __SSIZE_T_TYPE __SWORD_TYPE 2025-05-07T19:46:54.0025456Z #define __STDCPP_DEFAULT_NEW_ALIGNMENT__ 16UL 2025-05-07T19:46:54.0025564Z #define __STDCPP_MATH_SPEC_FUNCS__ 201003L 2025-05-07T19:46:54.0025668Z #define __STDCPP_THREADS__ 1 2025-05-07T19:46:54.0025758Z #define __STDC_HOSTED__ 1 2025-05-07T19:46:54.0025856Z #define __STDC_IEC_559_COMPLEX__ 1 2025-05-07T19:46:54.0025951Z #define __STDC_IEC_559__ 1 2025-05-07T19:46:54.0026039Z #define __STDC_ISO_10646__ 201103L 2025-05-07T19:46:54.0026132Z #define __STDC_NO_THREADS__ 1 2025-05-07T19:46:54.0026225Z #define __STDC_UTF_16__ 1 2025-05-07T19:46:54.0026317Z #define __STDC_UTF_32__ 1 2025-05-07T19:46:54.0026398Z #define __STDC__ 1 2025-05-07T19:46:54.0026481Z #define __STDDEF_H 2025-05-07T19:46:54.0026576Z #define __STRING(x) #x 2025-05-07T19:46:54.0026689Z #define __SURFACE_INDIRECT_FUNCTIONS_H__ 2025-05-07T19:46:54.0026783Z #define __SURFACE_TYPES_H__ 2025-05-07T19:46:54.0026908Z #define __SUSECONDS_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:54.0027008Z #define __SWORD_TYPE long int 2025-05-07T19:46:54.0027127Z #define __SYSCALL_SLONG_TYPE __SLONGWORD_TYPE 2025-05-07T19:46:54.0027248Z #define __SYSCALL_ULONG_TYPE __ULONGWORD_TYPE 2025-05-07T19:46:54.0027351Z #define __SYSCALL_WORDSIZE 64 2025-05-07T19:46:54.0027459Z #define __TEXTURE_INDIRECT_FUNCTIONS_H__ 2025-05-07T19:46:54.0027552Z #define __TEXTURE_TYPES_H__ 2025-05-07T19:46:54.0027642Z #define __THROW throw () 2025-05-07T19:46:54.0027740Z #define __THROWNL throw () 2025-05-07T19:46:54.0027833Z #define __TIMER_T_TYPE void * 2025-05-07T19:46:54.0027995Z #define __TIME_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:54.0028105Z #define __U16_TYPE unsigned short int 2025-05-07T19:46:54.0028199Z #define __U32_TYPE unsigned int 2025-05-07T19:46:54.0028302Z #define __U64_TYPE unsigned long int 2025-05-07T19:46:54.0028394Z #define __UID_T_TYPE __U32_TYPE 2025-05-07T19:46:54.0028497Z #define __UINT16_C_SUFFIX__ 2025-05-07T19:46:54.0028588Z #define __UINT16_FMTX__ "hX" 2025-05-07T19:46:54.0028679Z #define __UINT16_FMTo__ "ho" 2025-05-07T19:46:54.0028780Z #define __UINT16_FMTu__ "hu" 2025-05-07T19:46:54.0028870Z #define __UINT16_FMTx__ "hx" 2025-05-07T19:46:54.0028960Z #define __UINT16_MAX__ 65535 2025-05-07T19:46:54.0029067Z #define __UINT16_TYPE__ unsigned short 2025-05-07T19:46:54.0029171Z #define __UINT32_C_SUFFIX__ U 2025-05-07T19:46:54.0029256Z #define __UINT32_FMTX__ "X" 2025-05-07T19:46:54.0029344Z #define __UINT32_FMTo__ "o" 2025-05-07T19:46:54.0029444Z #define __UINT32_FMTu__ "u" 2025-05-07T19:46:54.0029535Z #define __UINT32_FMTx__ "x" 2025-05-07T19:46:54.0029635Z #define __UINT32_MAX__ 4294967295U 2025-05-07T19:46:54.0029733Z #define __UINT32_TYPE__ unsigned int 2025-05-07T19:46:54.0029835Z #define __UINT64_C_SUFFIX__ UL 2025-05-07T19:46:54.0029928Z #define __UINT64_FMTX__ "lX" 2025-05-07T19:46:54.0030071Z #define __UINT64_FMTo__ "lo" 2025-05-07T19:46:54.0030173Z #define __UINT64_FMTu__ "lu" 2025-05-07T19:46:54.0030263Z #define __UINT64_FMTx__ "lx" 2025-05-07T19:46:54.0030370Z #define __UINT64_MAX__ 18446744073709551615UL 2025-05-07T19:46:54.0030475Z #define __UINT64_TYPE__ long unsigned int 2025-05-07T19:46:54.0030581Z #define __UINT8_C_SUFFIX__ 2025-05-07T19:46:54.0030667Z #define __UINT8_FMTX__ "hhX" 2025-05-07T19:46:54.0030758Z #define __UINT8_FMTo__ "hho" 2025-05-07T19:46:54.0030863Z #define __UINT8_FMTu__ "hhu" 2025-05-07T19:46:54.0030953Z #define __UINT8_FMTx__ "hhx" 2025-05-07T19:46:54.0031041Z #define __UINT8_MAX__ 255 2025-05-07T19:46:54.0031139Z #define __UINT8_TYPE__ unsigned char 2025-05-07T19:46:54.0031248Z #define __UINTMAX_C_SUFFIX__ UL 2025-05-07T19:46:54.0031346Z #define __UINTMAX_FMTX__ "lX" 2025-05-07T19:46:54.0031439Z #define __UINTMAX_FMTo__ "lo" 2025-05-07T19:46:54.0031543Z #define __UINTMAX_FMTu__ "lu" 2025-05-07T19:46:54.0031636Z #define __UINTMAX_FMTx__ "lx" 2025-05-07T19:46:54.0031745Z #define __UINTMAX_MAX__ 18446744073709551615UL 2025-05-07T19:46:54.0031859Z #define __UINTMAX_TYPE__ long unsigned int 2025-05-07T19:46:54.0031960Z #define __UINTMAX_WIDTH__ 64 2025-05-07T19:46:54.0032052Z #define __UINTPTR_FMTX__ "lX" 2025-05-07T19:46:54.0032142Z #define __UINTPTR_FMTo__ "lo" 2025-05-07T19:46:54.0032247Z #define __UINTPTR_FMTu__ "lu" 2025-05-07T19:46:54.0032337Z #define __UINTPTR_FMTx__ "lx" 2025-05-07T19:46:54.0032444Z #define __UINTPTR_MAX__ 18446744073709551615UL 2025-05-07T19:46:54.0032551Z #define __UINTPTR_TYPE__ long unsigned int 2025-05-07T19:46:54.0032655Z #define __UINTPTR_WIDTH__ 64 2025-05-07T19:46:54.0032748Z #define __UINT_FAST16_FMTX__ "hX" 2025-05-07T19:46:54.0032839Z #define __UINT_FAST16_FMTo__ "ho" 2025-05-07T19:46:54.0032947Z #define __UINT_FAST16_FMTu__ "hu" 2025-05-07T19:46:54.0033041Z #define __UINT_FAST16_FMTx__ "hx" 2025-05-07T19:46:54.0033132Z #define __UINT_FAST16_MAX__ 65535 2025-05-07T19:46:54.0033242Z #define __UINT_FAST16_TYPE__ unsigned short 2025-05-07T19:46:54.0033346Z #define __UINT_FAST32_FMTX__ "X" 2025-05-07T19:46:54.0033437Z #define __UINT_FAST32_FMTo__ "o" 2025-05-07T19:46:54.0033526Z #define __UINT_FAST32_FMTu__ "u" 2025-05-07T19:46:54.0033630Z #define __UINT_FAST32_FMTx__ "x" 2025-05-07T19:46:54.0033730Z #define __UINT_FAST32_MAX__ 4294967295U 2025-05-07T19:46:54.0033833Z #define __UINT_FAST32_TYPE__ unsigned int 2025-05-07T19:46:54.0033932Z #define __UINT_FAST64_FMTX__ "lX" 2025-05-07T19:46:54.0034023Z #define __UINT_FAST64_FMTo__ "lo" 2025-05-07T19:46:54.0034115Z #define __UINT_FAST64_FMTu__ "lu" 2025-05-07T19:46:54.0034206Z #define __UINT_FAST64_FMTx__ "lx" 2025-05-07T19:46:54.0034335Z #define __UINT_FAST64_MAX__ 18446744073709551615UL 2025-05-07T19:46:54.0034452Z #define __UINT_FAST64_TYPE__ long unsigned int 2025-05-07T19:46:54.0034598Z #define __UINT_FAST8_FMTX__ "hhX" 2025-05-07T19:46:54.0034702Z #define __UINT_FAST8_FMTo__ "hho" 2025-05-07T19:46:54.0034796Z #define __UINT_FAST8_FMTu__ "hhu" 2025-05-07T19:46:54.0034891Z #define __UINT_FAST8_FMTx__ "hhx" 2025-05-07T19:46:54.0034982Z #define __UINT_FAST8_MAX__ 255 2025-05-07T19:46:54.0035100Z #define __UINT_FAST8_TYPE__ unsigned char 2025-05-07T19:46:54.0035196Z #define __UINT_LEAST16_FMTX__ "hX" 2025-05-07T19:46:54.0035291Z #define __UINT_LEAST16_FMTo__ "ho" 2025-05-07T19:46:54.0035392Z #define __UINT_LEAST16_FMTu__ "hu" 2025-05-07T19:46:54.0035486Z #define __UINT_LEAST16_FMTx__ "hx" 2025-05-07T19:46:54.0035580Z #define __UINT_LEAST16_MAX__ 65535 2025-05-07T19:46:54.0035691Z #define __UINT_LEAST16_TYPE__ unsigned short 2025-05-07T19:46:54.0035794Z #define __UINT_LEAST32_FMTX__ "X" 2025-05-07T19:46:54.0035888Z #define __UINT_LEAST32_FMTo__ "o" 2025-05-07T19:46:54.0035982Z #define __UINT_LEAST32_FMTu__ "u" 2025-05-07T19:46:54.0036088Z #define __UINT_LEAST32_FMTx__ "x" 2025-05-07T19:46:54.0036189Z #define __UINT_LEAST32_MAX__ 4294967295U 2025-05-07T19:46:54.0036298Z #define __UINT_LEAST32_TYPE__ unsigned int 2025-05-07T19:46:54.0036391Z #define __UINT_LEAST64_FMTX__ "lX" 2025-05-07T19:46:54.0036544Z #define __UINT_LEAST64_FMTo__ "lo" 2025-05-07T19:46:54.0036640Z #define __UINT_LEAST64_FMTu__ "lu" 2025-05-07T19:46:54.0036733Z #define __UINT_LEAST64_FMTx__ "lx" 2025-05-07T19:46:54.0036862Z #define __UINT_LEAST64_MAX__ 18446744073709551615UL 2025-05-07T19:46:54.0036983Z #define __UINT_LEAST64_TYPE__ long unsigned int 2025-05-07T19:46:54.0037078Z #define __UINT_LEAST8_FMTX__ "hhX" 2025-05-07T19:46:54.0037181Z #define __UINT_LEAST8_FMTo__ "hho" 2025-05-07T19:46:54.0037274Z #define __UINT_LEAST8_FMTu__ "hhu" 2025-05-07T19:46:54.0037367Z #define __UINT_LEAST8_FMTx__ "hhx" 2025-05-07T19:46:54.0037459Z #define __UINT_LEAST8_MAX__ 255 2025-05-07T19:46:54.0037574Z #define __UINT_LEAST8_TYPE__ unsigned char 2025-05-07T19:46:54.0037673Z #define __ULONG32_TYPE unsigned int 2025-05-07T19:46:54.0037788Z #define __ULONGWORD_TYPE unsigned long int 2025-05-07T19:46:54.0037902Z #define __UQUAD_TYPE unsigned long int 2025-05-07T19:46:54.0037997Z #define __USECONDS_T_TYPE __U32_TYPE 2025-05-07T19:46:54.0038097Z #define __USER_LABEL_PREFIX__ 2025-05-07T19:46:54.0038182Z #define __USE_ANSI 1 2025-05-07T19:46:54.0038276Z #define __USE_ATFILE 1 2025-05-07T19:46:54.0038358Z #define __USE_BSD 1 2025-05-07T19:46:54.0038451Z #define __USE_FORTIFY_LEVEL 0 2025-05-07T19:46:54.0038542Z #define __USE_GNU 1 2025-05-07T19:46:54.0038625Z #define __USE_ISOC11 1 2025-05-07T19:46:54.0038711Z #define __USE_ISOC95 1 2025-05-07T19:46:54.0038794Z #define __USE_ISOC99 1 2025-05-07T19:46:54.0038891Z #define __USE_ISOCXX11 1 2025-05-07T19:46:54.0038979Z #define __USE_LARGEFILE 1 2025-05-07T19:46:54.0039069Z #define __USE_LARGEFILE64 1 2025-05-07T19:46:54.0039165Z #define __USE_MISC 1 2025-05-07T19:46:54.0039250Z #define __USE_POSIX 1 2025-05-07T19:46:54.0039341Z #define __USE_POSIX199309 1 2025-05-07T19:46:54.0039436Z #define __USE_POSIX199506 1 2025-05-07T19:46:54.0039531Z #define __USE_POSIX2 1 2025-05-07T19:46:54.0039614Z #define __USE_SVID 1 2025-05-07T19:46:54.0039698Z #define __USE_UNIX98 1 2025-05-07T19:46:54.0039786Z #define __USE_XOPEN 1 2025-05-07T19:46:54.0039884Z #define __USE_XOPEN2K 1 2025-05-07T19:46:54.0039972Z #define __USE_XOPEN2K8 1 2025-05-07T19:46:54.0040061Z #define __USE_XOPEN2K8XSI 1 2025-05-07T19:46:54.0040166Z #define __USE_XOPEN2KXSI 1 2025-05-07T19:46:54.0040258Z #define __USE_XOPEN_EXTENDED 1 2025-05-07T19:46:54.0040359Z #define __USING_NAMESPACE_C99(name) 2025-05-07T19:46:54.0040570Z #define __USING_NAMESPACE_STD(name) 2025-05-07T19:46:54.0040681Z #define __UWORD_TYPE unsigned long int 2025-05-07T19:46:54.0040877Z #define __VECTOR_FUNCTIONS_HPP__ 2025-05-07T19:46:54.0040969Z #define __VECTOR_FUNCTIONS_H__ 2025-05-07T19:46:54.0041062Z #define __VECTOR_TYPES_H__ 2025-05-07T19:46:54.0041468Z #define __VERSION__ "Clang 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4)" 2025-05-07T19:46:54.0041654Z #define __WAIT_INT(status) (*(int *) &(status)) 2025-05-07T19:46:54.0041754Z #define __WAIT_STATUS void * 2025-05-07T19:46:54.0041847Z #define __WAIT_STATUS_DEFN void * 2025-05-07T19:46:54.0041927Z #define __WALL 0x40000000 2025-05-07T19:46:54.0042013Z #define __WCHAR_MAX__ 2147483647 2025-05-07T19:46:54.0042114Z #define __WCHAR_TYPE__ int 2025-05-07T19:46:54.0042198Z #define __WCHAR_WIDTH__ 32 2025-05-07T19:46:54.0042284Z #define __WCLONE 0x80000000 2025-05-07T19:46:54.0042420Z #define __WCOREDUMP(status) ((status) & __WCOREFLAG) 2025-05-07T19:46:54.0042504Z #define __WCOREFLAG 0x80 2025-05-07T19:46:54.0042638Z #define __WEXITSTATUS(status) (((status) & 0xff00) >> 8) 2025-05-07T19:46:54.0042784Z #define __WIFCONTINUED(status) ((status) == __W_CONTINUED) 2025-05-07T19:46:54.0042924Z #define __WIFEXITED(status) (__WTERMSIG(status) == 0) 2025-05-07T19:46:54.0043130Z #define __WIFSIGNALED(status) (((signed char) (((status) & 0x7f) + 1) >> 1) > 0) 2025-05-07T19:46:54.0043267Z #define __WIFSTOPPED(status) (((status) & 0xff) == 0x7f) 2025-05-07T19:46:54.0043368Z #define __WINT_MAX__ 4294967295U 2025-05-07T19:46:54.0043458Z #define __WINT_TYPE__ unsigned int 2025-05-07T19:46:54.0043592Z #define __WINT_UNSIGNED__ 1 2025-05-07T19:46:54.0043674Z #define __WINT_WIDTH__ 32 2025-05-07T19:46:54.0043770Z #define __WNOTHREAD 0x20000000 2025-05-07T19:46:54.0043846Z #define __WORDSIZE 64 2025-05-07T19:46:54.0043937Z #define __WORDSIZE_TIME64_COMPAT32 1 2025-05-07T19:46:54.0044067Z #define __WSTOPSIG(status) __WEXITSTATUS(status) 2025-05-07T19:46:54.0044173Z #define __WTERMSIG(status) ((status) & 0x7f) 2025-05-07T19:46:54.0044255Z #define __W_CONTINUED 0xffff 2025-05-07T19:46:54.0044378Z #define __W_EXITCODE(ret,sig) ((ret) << 8 | (sig)) 2025-05-07T19:46:54.0044482Z #define __W_STOPCODE(sig) ((sig) << 8 | 0x7f) 2025-05-07T19:46:54.0044567Z #define ____FILE_defined 1 2025-05-07T19:46:54.0044654Z #define ____mbstate_t_defined 1 2025-05-07T19:46:54.0044779Z #define __align__(n) __attribute__((aligned(n))) 2025-05-07T19:46:54.0044956Z #define __always_inline __inline __attribute__ ((__always_inline__)) 2025-05-07T19:46:54.0045032Z #define __amd64 1 2025-05-07T19:46:54.0045125Z #define __amd64__ 1 2025-05-07T19:46:54.0045228Z #define __annotate__(a) __attribute__((a)) 2025-05-07T19:46:54.0045322Z #define __attribute_artificial__ 2025-05-07T19:46:54.0045452Z #define __attribute_const__ __attribute__ ((__const__)) 2025-05-07T19:46:54.0045632Z #define __attribute_deprecated__ __attribute__ ((__deprecated__)) 2025-05-07T19:46:54.0045820Z #define __attribute_format_arg__(x) __attribute__ ((__format_arg__ (x))) 2025-05-07T19:46:54.0046059Z #define __attribute_format_strfmon__(a,b) __attribute__ ((__format__ (__strfmon__, a, b))) 2025-05-07T19:46:54.0046211Z #define __attribute_malloc__ __attribute__ ((__malloc__)) 2025-05-07T19:46:54.0046361Z #define __attribute_noinline__ __attribute__ ((__noinline__)) 2025-05-07T19:46:54.0046486Z #define __attribute_pure__ __attribute__ ((__pure__)) 2025-05-07T19:46:54.0046619Z #define __attribute_used__ __attribute__ ((__used__)) 2025-05-07T19:46:54.0046837Z #define __attribute_warn_unused_result__ __attribute__ ((__warn_unused_result__)) 2025-05-07T19:46:54.0046926Z #define __blkcnt_t_defined 2025-05-07T19:46:54.0047009Z #define __blksize_t_defined 2025-05-07T19:46:54.0047193Z #define __bos(ptr) __builtin_object_size (ptr, __USE_FORTIFY_LEVEL > 1) 2025-05-07T19:46:54.0047314Z #define __bos0(ptr) __builtin_object_size (ptr, 0) 2025-05-07T19:46:54.0047392Z #define __bounded 2025-05-07T19:46:54.0047979Z #define __bswap_16(x) (__extension__ ({ unsigned short int __v, __x = (unsigned short int) (x); if (__builtin_constant_p (__x)) __v = __bswap_constant_16 (__x); else __asm__ ("rorw $8, %w0" : "=r" (__v) : "0" (__x) : "cc"); __v; })) 2025-05-07T19:46:54.0048435Z #define __bswap_32(x) (__extension__ ({ unsigned int __v, __x = (x); if (__builtin_constant_p (__x)) __v = __bswap_constant_32 (__x); else __asm__ ("bswap %0" : "=r" (__v) : "0" (__x)); __v; })) 2025-05-07T19:46:54.0048938Z #define __bswap_64(x) (__extension__ ({ __uint64_t __v, __x = (x); if (__builtin_constant_p (__x)) __v = __bswap_constant_64 (__x); else __asm__ ("bswap %q0" : "=r" (__v) : "0" (__x)); __v; })) 2025-05-07T19:46:54.0049192Z #define __bswap_constant_16(x) ((unsigned short int) ((((x) >> 8) & 0xff) | (((x) & 0xff) << 8))) 2025-05-07T19:46:54.0049500Z #define __bswap_constant_32(x) ((((x) & 0xff000000) >> 24) | (((x) & 0x00ff0000) >> 8) | (((x) & 0x0000ff00) << 8) | (((x) & 0x000000ff) << 24)) 2025-05-07T19:46:54.0050407Z #define __bswap_constant_64(x) (__extension__ ((((x) & 0xff00000000000000ull) >> 56) | (((x) & 0x00ff000000000000ull) >> 40) | (((x) & 0x0000ff0000000000ull) >> 24) | (((x) & 0x000000ff00000000ull) >> 8) | (((x) & 0x00000000ff000000ull) << 8) | (((x) & 0x0000000000ff0000ull) << 24) | (((x) & 0x000000000000ff00ull) << 40) | (((x) & 0x00000000000000ffull) << 56))) 2025-05-07T19:46:54.0050514Z #define __builtin_align__(a) __align__(a) 2025-05-07T19:46:54.0050602Z #define __catch(X) catch(X) 2025-05-07T19:46:54.0050674Z #define __cdecl 2025-05-07T19:46:54.0050767Z #define __clang__ 1 2025-05-07T19:46:54.0050872Z #define __clang_literal_encoding__ "UTF-8" 2025-05-07T19:46:54.0051010Z #define __clang_major__ 16 2025-05-07T19:46:54.0051106Z #define __clang_minor__ 0 2025-05-07T19:46:54.0051195Z #define __clang_patchlevel__ 6 2025-05-07T19:46:54.0051591Z #define __clang_version__ "16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4)" 2025-05-07T19:46:54.0051712Z #define __clang_wide_literal_encoding__ "UTF-32" 2025-05-07T19:46:54.0051813Z #define __clock_t_defined 1 2025-05-07T19:46:54.0051902Z #define __clockid_t_defined 1 2025-05-07T19:46:54.0052088Z #define __cluster_dims__(...) __attribute__((cluster_dims(__VA_ARGS__))) 2025-05-07T19:46:54.0052194Z #define __code_model_small__ 1 2025-05-07T19:46:54.0052297Z #define __constant__ __location__(constant) 2025-05-07T19:46:54.0052386Z #define __cplusplus 201703L 2025-05-07T19:46:54.0052508Z #define __cpp_aggregate_bases 201603L 2025-05-07T19:46:54.0052606Z #define __cpp_aggregate_nsdmi 201304L 2025-05-07T19:46:54.0052705Z #define __cpp_alias_templates 200704L 2025-05-07T19:46:54.0052802Z #define __cpp_aligned_new 201606L 2025-05-07T19:46:54.0052911Z #define __cpp_attributes 200809L 2025-05-07T19:46:54.0053006Z #define __cpp_binary_literals 201304L 2025-05-07T19:46:54.0053105Z #define __cpp_capture_star_this 201603L 2025-05-07T19:46:54.0053212Z #define __cpp_constexpr 201603L 2025-05-07T19:46:54.0053321Z #define __cpp_constexpr_in_decltype 201711L 2025-05-07T19:46:54.0053414Z #define __cpp_decltype 200707L 2025-05-07T19:46:54.0053511Z #define __cpp_decltype_auto 201304L 2025-05-07T19:46:54.0053627Z #define __cpp_deduction_guides 201703L 2025-05-07T19:46:54.0053739Z #define __cpp_delegating_constructors 200604L 2025-05-07T19:46:54.0053838Z #define __cpp_digit_separators 201309L 2025-05-07T19:46:54.0053959Z #define __cpp_enumerator_attributes 201411L 2025-05-07T19:46:54.0054059Z #define __cpp_exceptions 199711L 2025-05-07T19:46:54.0054160Z #define __cpp_fold_expressions 201603L 2025-05-07T19:46:54.0054258Z #define __cpp_generic_lambdas 201304L 2025-05-07T19:46:54.0054389Z #define __cpp_guaranteed_copy_elision 201606L 2025-05-07T19:46:54.0054478Z #define __cpp_hex_float 201603L 2025-05-07T19:46:54.0054570Z #define __cpp_if_constexpr 201606L 2025-05-07T19:46:54.0054695Z #define __cpp_impl_destroying_delete 201806L 2025-05-07T19:46:54.0054809Z #define __cpp_inheriting_constructors 201511L 2025-05-07T19:46:54.0054899Z #define __cpp_init_captures 201304L 2025-05-07T19:46:54.0054997Z #define __cpp_initializer_lists 200806L 2025-05-07T19:46:54.0055112Z #define __cpp_inline_variables 201606L 2025-05-07T19:46:54.0055273Z #define __cpp_lambdas 200907L 2025-05-07T19:46:54.0055386Z #define __cpp_lib_addressof_constexpr 201603 2025-05-07T19:46:54.0055506Z #define __cpp_lib_array_constexpr 201803L 2025-05-07T19:46:54.0055767Z #define __cpp_lib_as_const 201510 2025-05-07T19:46:54.0055940Z #define __cpp_lib_bool_constant 201505 2025-05-07T19:46:54.0056059Z #define __cpp_lib_exchange_function 201304 2025-05-07T19:46:54.0056222Z #define __cpp_lib_has_unique_object_representations 201606 2025-05-07T19:46:54.0056327Z #define __cpp_lib_hypot 201603 2025-05-07T19:46:54.0056431Z #define __cpp_lib_integer_sequence 201304 2025-05-07T19:46:54.0056577Z #define __cpp_lib_integral_constant_callable 201304 2025-05-07T19:46:54.0056682Z #define __cpp_lib_is_aggregate 201703 2025-05-07T19:46:54.0056779Z #define __cpp_lib_is_final 201402L 2025-05-07T19:46:54.0056888Z #define __cpp_lib_is_invocable 201703 2025-05-07T19:46:54.0056994Z #define __cpp_lib_is_null_pointer 201309 2025-05-07T19:46:54.0057097Z #define __cpp_lib_is_swappable 201603 2025-05-07T19:46:54.0057195Z #define __cpp_lib_launder 201606 2025-05-07T19:46:54.0057313Z #define __cpp_lib_logical_traits 201510 2025-05-07T19:46:54.0057434Z #define __cpp_lib_make_reverse_iterator 201402 2025-05-07T19:46:54.0057560Z #define __cpp_lib_math_special_functions 201603L 2025-05-07T19:46:54.0057683Z #define __cpp_lib_result_of_sfinae 201210 2025-05-07T19:46:54.0057820Z #define __cpp_lib_robust_nonmodifying_seq_ops 201304 2025-05-07T19:46:54.0058017Z #define __cpp_lib_transformation_trait_aliases 201304 2025-05-07T19:46:54.0058127Z #define __cpp_lib_tuple_element_t 201402L 2025-05-07T19:46:54.0058242Z #define __cpp_lib_tuples_by_type 201304 2025-05-07T19:46:54.0058388Z #define __cpp_lib_type_trait_variable_templates 201510L 2025-05-07T19:46:54.0058484Z #define __cpp_lib_void_t 201411 2025-05-07T19:46:54.0058611Z #define __cpp_named_character_escapes 202207L 2025-05-07T19:46:54.0058721Z #define __cpp_namespace_attributes 201411L 2025-05-07T19:46:54.0058853Z #define __cpp_nested_namespace_definitions 201411L 2025-05-07T19:46:54.0058979Z #define __cpp_noexcept_function_type 201510L 2025-05-07T19:46:54.0059091Z #define __cpp_nontype_template_args 201411L 2025-05-07T19:46:54.0059235Z #define __cpp_nontype_template_parameter_auto 201606L 2025-05-07T19:46:54.0059328Z #define __cpp_nsdmi 200809L 2025-05-07T19:46:54.0059445Z #define __cpp_range_based_for 201603L 2025-05-07T19:46:54.0059541Z #define __cpp_raw_strings 200710L 2025-05-07T19:46:54.0059640Z #define __cpp_ref_qualifiers 200710L 2025-05-07T19:46:54.0059766Z #define __cpp_return_type_deduction 201304L 2025-05-07T19:46:54.0059857Z #define __cpp_rtti 199711L 2025-05-07T19:46:54.0059962Z #define __cpp_rvalue_references 200610L 2025-05-07T19:46:54.0060064Z #define __cpp_static_assert 201411L 2025-05-07T19:46:54.0060187Z #define __cpp_static_call_operator 202207L 2025-05-07T19:46:54.0060297Z #define __cpp_structured_bindings 201606L 2025-05-07T19:46:54.0060398Z #define __cpp_template_auto 201606L 2025-05-07T19:46:54.0060528Z #define __cpp_threadsafe_static_init 200806L 2025-05-07T19:46:54.0060631Z #define __cpp_unicode_characters 200704L 2025-05-07T19:46:54.0060731Z #define __cpp_unicode_literals 200710L 2025-05-07T19:46:54.0060841Z #define __cpp_user_defined_literals 200809L 2025-05-07T19:46:54.0060962Z #define __cpp_variable_templates 201304L 2025-05-07T19:46:54.0061070Z #define __cpp_variadic_templates 200704L 2025-05-07T19:46:54.0061172Z #define __cpp_variadic_using 201611L 2025-05-07T19:46:54.0061294Z #define __cudaCDP2DeviceGetAttribute 2025-05-07T19:46:54.0061406Z #define __cudaCDP2DeviceGetCacheConfig 2025-05-07T19:46:54.0061510Z #define __cudaCDP2DeviceGetLimit 2025-05-07T19:46:54.0061641Z #define __cudaCDP2DeviceGetSharedMemConfig 2025-05-07T19:46:54.0061753Z #define __cudaCDP2EventCreateWithFlags 2025-05-07T19:46:54.0061852Z #define __cudaCDP2EventDestroy 2025-05-07T19:46:54.0061951Z #define __cudaCDP2EventRecord 2025-05-07T19:46:54.0062077Z #define __cudaCDP2EventRecordWithFlags 2025-05-07T19:46:54.0062205Z #define __cudaCDP2EventRecordWithFlags_ptsz 2025-05-07T19:46:54.0062310Z #define __cudaCDP2EventRecord_ptsz 2025-05-07T19:46:54.0062414Z #define __cudaCDP2Free 2025-05-07T19:46:54.0062522Z #define __cudaCDP2FuncGetAttributes 2025-05-07T19:46:54.0062622Z #define __cudaCDP2GetDevice 2025-05-07T19:46:54.0062778Z #define __cudaCDP2GetDeviceCount 2025-05-07T19:46:54.0062889Z #define __cudaCDP2GetErrorName 2025-05-07T19:46:54.0062993Z #define __cudaCDP2GetErrorString 2025-05-07T19:46:54.0063090Z #define __cudaCDP2GetLastError 2025-05-07T19:46:54.0063217Z #define __cudaCDP2GetParameterBuffer 2025-05-07T19:46:54.0063329Z #define __cudaCDP2GetParameterBufferV2 2025-05-07T19:46:54.0063428Z #define __cudaCDP2LaunchDevice 2025-05-07T19:46:54.0063528Z #define __cudaCDP2LaunchDeviceV2 2025-05-07T19:46:54.0063652Z #define __cudaCDP2LaunchDeviceV2_ptsz 2025-05-07T19:46:54.0063762Z #define __cudaCDP2LaunchDevice_ptsz 2025-05-07T19:46:54.0063853Z #define __cudaCDP2Malloc 2025-05-07T19:46:54.0063966Z #define __cudaCDP2Memcpy2DAsync 2025-05-07T19:46:54.0064075Z #define __cudaCDP2Memcpy2DAsync_ptsz 2025-05-07T19:46:54.0064178Z #define __cudaCDP2Memcpy3DAsync 2025-05-07T19:46:54.0064282Z #define __cudaCDP2Memcpy3DAsync_ptsz 2025-05-07T19:46:54.0064429Z #define __cudaCDP2MemcpyAsync 2025-05-07T19:46:54.0064536Z #define __cudaCDP2MemcpyAsync_ptsz 2025-05-07T19:46:54.0064638Z #define __cudaCDP2Memset2DAsync 2025-05-07T19:46:54.0064754Z #define __cudaCDP2Memset2DAsync_ptsz 2025-05-07T19:46:54.0064853Z #define __cudaCDP2Memset3DAsync 2025-05-07T19:46:54.0065023Z #define __cudaCDP2Memset3DAsync_ptsz 2025-05-07T19:46:54.0065124Z #define __cudaCDP2MemsetAsync 2025-05-07T19:46:54.0065237Z #define __cudaCDP2MemsetAsync_ptsz 2025-05-07T19:46:54.0065436Z #define __cudaCDP2OccupancyMaxActiveBlocksPerMultiprocessor 2025-05-07T19:46:54.0065682Z #define __cudaCDP2OccupancyMaxActiveBlocksPerMultiprocessorWithFlags 2025-05-07T19:46:54.0065798Z #define __cudaCDP2PeekAtLastError 2025-05-07T19:46:54.0065903Z #define __cudaCDP2RuntimeGetVersion 2025-05-07T19:46:54.0066023Z #define __cudaCDP2StreamCreateWithFlags 2025-05-07T19:46:54.0066127Z #define __cudaCDP2StreamDestroy 2025-05-07T19:46:54.0066242Z #define __cudaCDP2StreamWaitEvent 2025-05-07T19:46:54.0066354Z #define __cudaCDP2StreamWaitEvent_ptsz 2025-05-07T19:46:54.0066462Z #define __cudaGet_blockDim() blockDim 2025-05-07T19:46:54.0066571Z #define __cudaGet_blockIdx() blockIdx 2025-05-07T19:46:54.0066666Z #define __cudaGet_gridDim() gridDim 2025-05-07T19:46:54.0066775Z #define __cudaGet_threadIdx() threadIdx 2025-05-07T19:46:54.0066886Z #define __cudaGet_warpSize() warpSize 2025-05-07T19:46:54.0067035Z #define __cudart_builtin__ __location__(cudart_builtin) 2025-05-07T19:46:54.0067127Z #define __daddr_t_defined 2025-05-07T19:46:54.0067216Z #define __dev_t_defined 2025-05-07T19:46:54.0067329Z #define __device__ __location__(device) 2025-05-07T19:46:54.0067472Z #define __device_builtin__ __location__(device_builtin) 2025-05-07T19:46:54.0067713Z #define __device_builtin_surface_type__ __location__(device_builtin_surface_type) 2025-05-07T19:46:54.0067960Z #define __device_builtin_texture_type__ __location__(device_builtin_texture_type) 2025-05-07T19:46:54.0068101Z #define __errordecl(name,msg) extern void name (void) 2025-05-07T19:46:54.0068237Z #define __exctype(name) extern int name (int) __THROW 2025-05-07T19:46:54.0068439Z #define __exctype_l(name) extern int name (int, __locale_t) __THROW 2025-05-07T19:46:54.0068523Z #define __export__ 2025-05-07T19:46:54.0068777Z #define __extern_always_inline extern __always_inline __attribute__ ((__gnu_inline__)) 2025-05-07T19:46:54.0068981Z #define __extern_inline extern __inline __attribute__ ((__gnu_inline__)) 2025-05-07T19:46:54.0069079Z #define __flexarr [] 2025-05-07T19:46:54.0069258Z #define __forceinline__ __inline__ __attribute__((always_inline)) 2025-05-07T19:46:54.0069473Z #define __fortify_function __extern_always_inline __attribute_artificial__ 2025-05-07T19:46:54.0069584Z #define __fsblkcnt_t_defined 2025-05-07T19:46:54.0069680Z #define __fsfilcnt_t_defined 2025-05-07T19:46:54.0069766Z #define __gid_t_defined 2025-05-07T19:46:54.0069916Z #define __glibc_likely(cond) __builtin_expect((cond), 1) 2025-05-07T19:46:54.0070084Z #define __glibc_unlikely(cond) __builtin_expect((cond), 0) 2025-05-07T19:46:54.0070320Z #define __glibcxx_assert(cond) do { __glibcxx_constexpr_assert(cond); } while (false) 2025-05-07T19:46:54.0070486Z #define __glibcxx_class_requires(_a,_b) 2025-05-07T19:46:54.0070614Z #define __glibcxx_class_requires2(_a,_b,_c) 2025-05-07T19:46:54.0070739Z #define __glibcxx_class_requires3(_a,_b,_c,_d) 2025-05-07T19:46:54.0070863Z #define __glibcxx_class_requires4(_a,_b,_c,_d,_e) 2025-05-07T19:46:54.0071246Z #define __glibcxx_constexpr_assert(cond) if (__builtin_is_constant_evaluated() && !bool(cond)) __builtin_unreachable() 2025-05-07T19:46:54.0071452Z #define __glibcxx_digits10_b(T,B) (__glibcxx_digits_b (T,B) * 643L / 2136) 2025-05-07T19:46:54.0071794Z #define __glibcxx_digits_b(T,B) (B - __glibcxx_signed_b (T,B)) 2025-05-07T19:46:54.0071902Z #define __glibcxx_function_requires(...) 2025-05-07T19:46:54.0072020Z #define __glibcxx_integral_traps true 2025-05-07T19:46:54.0072324Z #define __glibcxx_max_b(T,B) (__glibcxx_signed_b (T,B) ? (((((T)1 << (__glibcxx_digits_b (T,B) - 1)) - 1) << 1) + 1) : ~(T)0) 2025-05-07T19:46:54.0072571Z #define __glibcxx_min_b(T,B) (__glibcxx_signed_b (T,B) ? -__glibcxx_max_b (T,B) - 1 : (T)0) 2025-05-07T19:46:54.0072779Z #define __glibcxx_requires_can_decrement_range(_First1,_Last1,_First2) 2025-05-07T19:46:54.0072980Z #define __glibcxx_requires_can_increment(_First,_Size) 2025-05-07T19:46:54.0073177Z #define __glibcxx_requires_can_increment_range(_First1,_Last1,_First2) 2025-05-07T19:46:54.0073299Z #define __glibcxx_requires_cond(_Cond,_Msg) 2025-05-07T19:46:54.0073417Z #define __glibcxx_requires_heap(_First,_Last) 2025-05-07T19:46:54.0073570Z #define __glibcxx_requires_heap_pred(_First,_Last,_Pred) 2025-05-07T19:46:54.0073721Z #define __glibcxx_requires_irreflexive(_First,_Last) 2025-05-07T19:46:54.0073863Z #define __glibcxx_requires_irreflexive2(_First,_Last) 2025-05-07T19:46:54.0074039Z #define __glibcxx_requires_irreflexive_pred(_First,_Last,_Pred) 2025-05-07T19:46:54.0074219Z #define __glibcxx_requires_irreflexive_pred2(_First,_Last,_Pred) 2025-05-07T19:46:54.0074386Z #define __glibcxx_requires_non_empty_range(_First,_Last) 2025-05-07T19:46:54.0074489Z #define __glibcxx_requires_nonempty() 2025-05-07T19:46:54.0074856Z #define __glibcxx_requires_partitioned_lower(_First,_Last,_Value) 2025-05-07T19:46:54.0075274Z #define __glibcxx_requires_partitioned_lower_pred(_First,_Last,_Value,_Pred) 2025-05-07T19:46:54.0075464Z #define __glibcxx_requires_partitioned_upper(_First,_Last,_Value) 2025-05-07T19:46:54.0075693Z #define __glibcxx_requires_partitioned_upper_pred(_First,_Last,_Value,_Pred) 2025-05-07T19:46:54.0075837Z #define __glibcxx_requires_sorted(_First,_Last) 2025-05-07T19:46:54.0075998Z #define __glibcxx_requires_sorted_pred(_First,_Last,_Pred) 2025-05-07T19:46:54.0076186Z #define __glibcxx_requires_sorted_set(_First1,_Last1,_First2) 2025-05-07T19:46:54.0076398Z #define __glibcxx_requires_sorted_set_pred(_First1,_Last1,_First2,_Pred) 2025-05-07T19:46:54.0076527Z #define __glibcxx_requires_string(_String) 2025-05-07T19:46:54.0076664Z #define __glibcxx_requires_string_len(_String,_Len) 2025-05-07T19:46:54.0076781Z #define __glibcxx_requires_subscript(_N) 2025-05-07T19:46:54.0076937Z #define __glibcxx_requires_valid_range(_First,_Last) 2025-05-07T19:46:54.0077057Z #define __glibcxx_signed_b(T,B) ((T)(-1) < 0) 2025-05-07T19:46:54.0077160Z #define __global__ __location__(global) 2025-05-07T19:46:54.0077269Z #define __gnu_linux__ 1 2025-05-07T19:46:54.0077405Z #define __grid_constant__ __location__(grid_constant) 2025-05-07T19:46:54.0077504Z #define __have_pthread_attr_t 1 2025-05-07T19:46:54.0077603Z #define __host__ __location__(host) 2025-05-07T19:46:54.0077705Z #define __id_t_defined 2025-05-07T19:46:54.0077789Z #define __import__ 2025-05-07T19:46:54.0077935Z #define __inline_hint__ __attribute__((nv_inline_hint)) 2025-05-07T19:46:54.0078042Z #define __ino64_t_defined 2025-05-07T19:46:54.0078132Z #define __ino_t_defined 2025-05-07T19:46:54.0078220Z #define __int8_t_defined 2025-05-07T19:46:54.0078447Z #define __intN_t(N,MODE) typedef int int##N##_t __attribute__ ((__mode__ (MODE))) 2025-05-07T19:46:54.0078728Z #define __isalnum_l(c,l) __isctype_l((c), _ISalnum, (l)) 2025-05-07T19:46:54.0078875Z #define __isalpha_l(c,l) __isctype_l((c), _ISalpha, (l)) 2025-05-07T19:46:54.0078980Z #define __isascii(c) (((c) & ~0x7f) == 0) 2025-05-07T19:46:54.0079109Z #define __isascii_l(c,l) ((l), __isascii (c)) 2025-05-07T19:46:54.0079256Z #define __isblank_l(c,l) __isctype_l((c), _ISblank, (l)) 2025-05-07T19:46:54.0079399Z #define __iscntrl_l(c,l) __isctype_l((c), _IScntrl, (l)) 2025-05-07T19:46:54.0079674Z #define __isctype_l(c,type,locale) ((locale)->__ctype_b[(int) (c)] & (unsigned short int) type) 2025-05-07T19:46:54.0079850Z #define __isdigit_l(c,l) __isctype_l((c), _ISdigit, (l)) 2025-05-07T19:46:54.0079993Z #define __isgraph_l(c,l) __isctype_l((c), _ISgraph, (l)) 2025-05-07T19:46:54.0080189Z #define __isleap(year) ((year) % 4 == 0 && ((year) % 100 != 0 || (year) % 400 == 0)) 2025-05-07T19:46:54.0080352Z #define __islower_l(c,l) __isctype_l((c), _ISlower, (l)) 2025-05-07T19:46:54.0080506Z #define __isprint_l(c,l) __isctype_l((c), _ISprint, (l)) 2025-05-07T19:46:54.0080651Z #define __ispunct_l(c,l) __isctype_l((c), _ISpunct, (l)) 2025-05-07T19:46:54.0080810Z #define __isspace_l(c,l) __isctype_l((c), _ISspace, (l)) 2025-05-07T19:46:54.0081070Z #define __isupper_l(c,l) __isctype_l((c), _ISupper, (l)) 2025-05-07T19:46:54.0081228Z #define __isxdigit_l(c,l) __isctype_l((c), _ISxdigit, (l)) 2025-05-07T19:46:54.0081311Z #define __k8 1 2025-05-07T19:46:54.0081409Z #define __k8__ 1 2025-05-07T19:46:54.0081500Z #define __key_t_defined 2025-05-07T19:46:54.0081695Z #define __launch_bounds__(...) __annotate__(launch_bounds(__VA_ARGS__)) 2025-05-07T19:46:54.0081802Z #define __ldiv_t_defined 1 2025-05-07T19:46:54.0081889Z #define __linux 1 2025-05-07T19:46:54.0081975Z #define __linux__ 1 2025-05-07T19:46:54.0082067Z #define __lldiv_t_defined 1 2025-05-07T19:46:54.0082167Z #define __llvm__ 1 2025-05-07T19:46:54.0082274Z #define __location__(a) __annotate__(a) 2025-05-07T19:46:54.0082375Z #define __long_double_t long double 2025-05-07T19:46:54.0082498Z #define __malloc_and_calloc_defined 2025-05-07T19:46:54.0082606Z #define __managed__ __location__(managed) 2025-05-07T19:46:54.0082737Z #define __maxnreg__(a) __attribute__((maxnreg(a))) 2025-05-07T19:46:54.0082839Z #define __mode_t_defined 2025-05-07T19:46:54.0082929Z #define __need_IOV_MAX 2025-05-07T19:46:54.0083022Z #define __need_clockid_t 2025-05-07T19:46:54.0083113Z #define __nlink_t_defined 2025-05-07T19:46:54.0083253Z #define __no_return__ __attribute__((noreturn)) 2025-05-07T19:46:54.0083374Z #define __noinline__ __attribute__((noinline)) 2025-05-07T19:46:54.0083547Z #define __nonnull(params) __attribute__ ((__nonnull__ params)) 2025-05-07T19:46:54.0083671Z #define __nv_pure__ __location__(nv_pure) 2025-05-07T19:46:54.0083760Z #define __off64_t_defined 2025-05-07T19:46:54.0083850Z #define __off_t_defined 2025-05-07T19:46:54.0083935Z #define __pic__ 2 2025-05-07T19:46:54.0084029Z #define __pid_t_defined 2025-05-07T19:46:54.0084111Z #define __pie__ 2 2025-05-07T19:46:54.0084215Z #define __private_extern__ extern 2025-05-07T19:46:54.0084307Z #define __ptr_t void * 2025-05-07T19:46:54.0084393Z #define __ptrvalue 2025-05-07T19:46:54.0084479Z #define __restrict_arr 2025-05-07T19:46:54.0084618Z #define __seg_fs __attribute__((address_space(257))) 2025-05-07T19:46:54.0084757Z #define __seg_gs __attribute__((address_space(256))) 2025-05-07T19:46:54.0084857Z #define __shared__ __location__(shared) 2025-05-07T19:46:54.0084950Z #define __sigset_t_defined 2025-05-07T19:46:54.0085062Z #define __specialization_static 2025-05-07T19:46:54.0085150Z #define __ssize_t_defined 2025-05-07T19:46:54.0085237Z #define __stub_bdflush 2025-05-07T19:46:54.0085325Z #define __stub_chflags 2025-05-07T19:46:54.0085419Z #define __stub_fattach 2025-05-07T19:46:54.0085510Z #define __stub_fchflags 2025-05-07T19:46:54.0085597Z #define __stub_fdetach 2025-05-07T19:46:54.0085703Z #define __stub_getmsg 2025-05-07T19:46:54.0085784Z #define __stub_gtty 2025-05-07T19:46:54.0085922Z #define __stub_lchmod 2025-05-07T19:46:54.0086009Z #define __stub_putmsg 2025-05-07T19:46:54.0086105Z #define __stub_revoke 2025-05-07T19:46:54.0086192Z #define __stub_setlogin 2025-05-07T19:46:54.0086286Z #define __stub_sigreturn 2025-05-07T19:46:54.0086383Z #define __stub_sstk 2025-05-07T19:46:54.0086463Z #define __stub_stty 2025-05-07T19:46:54.0086558Z #define __suseconds_t_defined 2025-05-07T19:46:54.0086648Z #define __thread__ __thread 2025-05-07T19:46:54.0086761Z #define __throw_exception_again throw 2025-05-07T19:46:54.0086852Z #define __time_t_defined 1 2025-05-07T19:46:54.0086944Z #define __timer_t_defined 1 2025-05-07T19:46:54.0087054Z #define __timespec_defined 1 2025-05-07T19:46:54.0087148Z #define __toascii(c) ((c) & 0x7f) 2025-05-07T19:46:54.0087263Z #define __toascii_l(c,l) ((l), __toascii (c)) 2025-05-07T19:46:54.0087954Z #define __tobody(c,f,a,args) (__extension__ ({ int __res; if (sizeof (c) > 1) { if (__builtin_constant_p (c)) { int __c = (c); __res = __c < -128 || __c > 255 ? __c : (a)[__c]; } else __res = f args; } else __res = (a)[(int) (c)]; __res; })) 2025-05-07T19:46:54.0088213Z #define __try try 2025-05-07T19:46:54.0088299Z #define __tune_k8__ 1 2025-05-07T19:46:54.0088389Z #define __u_char_defined 2025-05-07T19:46:54.0088726Z #define __u_intN_t(N,MODE) typedef unsigned int u_int##N##_t __attribute__ ((__mode__ (MODE))) 2025-05-07T19:46:54.0088814Z #define __uid_t_defined 2025-05-07T19:46:54.0088896Z #define __unbounded 2025-05-07T19:46:54.0088978Z #define __unix 1 2025-05-07T19:46:54.0089073Z #define __unix__ 1 2025-05-07T19:46:54.0089166Z #define __useconds_t_defined 2025-05-07T19:46:54.0089256Z #define __warnattr(msg) 2025-05-07T19:46:54.0089408Z #define __warndecl(name,msg) extern void name (void) 2025-05-07T19:46:54.0089490Z #define __wur 2025-05-07T19:46:54.0089574Z #define __x86_64 1 2025-05-07T19:46:54.0089659Z #define __x86_64__ 1 2025-05-07T19:46:54.0089845Z #define _tolower(c) ((int) (*__ctype_tolower_loc ())[(int) (c)]) 2025-05-07T19:46:54.0090020Z #define _toupper(c) ((int) (*__ctype_toupper_loc ())[(int) (c)]) 2025-05-07T19:46:54.0090141Z #define alloca(size) __builtin_alloca (size) 2025-05-07T19:46:54.0090520Z #define assert(expr) ((expr) ? __ASSERT_VOID_CAST (0) : __assert_fail (__STRING(expr), __FILE__, __LINE__, __ASSERT_FUNCTION)) 2025-05-07T19:46:54.0090934Z #define assert_perror(errnum) (!(errnum) ? __ASSERT_VOID_CAST (0) : __assert_perror_fail ((errnum), __FILE__, __LINE__, __ASSERT_FUNCTION)) 2025-05-07T19:46:54.0091035Z #define be16toh(x) __bswap_16 (x) 2025-05-07T19:46:54.0091146Z #define be32toh(x) __bswap_32 (x) 2025-05-07T19:46:54.0091242Z #define be64toh(x) __bswap_64 (x) 2025-05-07T19:46:54.0091356Z #define cudaArrayColorAttachment 0x20 2025-05-07T19:46:54.0091458Z #define cudaArrayCubemap 0x04 2025-05-07T19:46:54.0091568Z #define cudaArrayDefault 0x00 2025-05-07T19:46:54.0091681Z #define cudaArrayDeferredMapping 0x80 2025-05-07T19:46:54.0091776Z #define cudaArrayLayered 0x01 2025-05-07T19:46:54.0091887Z #define cudaArraySparse 0x40 2025-05-07T19:46:54.0092044Z #define cudaArraySparsePropertiesSingleMipTail 0x1 2025-05-07T19:46:54.0092161Z #define cudaArraySurfaceLoadStore 0x02 2025-05-07T19:46:54.0092283Z #define cudaArrayTextureGather 0x08 2025-05-07T19:46:54.0092467Z #define cudaCooperativeLaunchMultiDeviceNoPostSync 0x02 2025-05-07T19:46:54.0092641Z #define cudaCooperativeLaunchMultiDeviceNoPreSync 0x01 2025-05-07T19:46:54.0092744Z #define cudaCpuDeviceId ((int)-1) 2025-05-07T19:46:54.0092863Z #define cudaDeviceBlockingSync 0x04 2025-05-07T19:46:54.0092976Z #define cudaDeviceLmemResizeToMax 0x10 2025-05-07T19:46:54.0093077Z #define cudaDeviceMapHost 0x08 2025-05-07T19:46:54.0093183Z #define cudaDeviceMask 0xff 2025-05-07T19:46:54.0093287Z #define cudaDeviceScheduleAuto 0x00 2025-05-07T19:46:54.0093412Z #define cudaDeviceScheduleBlockingSync 0x04 2025-05-07T19:46:54.0093515Z #define cudaDeviceScheduleMask 0x07 2025-05-07T19:46:54.0093630Z #define cudaDeviceScheduleSpin 0x01 2025-05-07T19:46:54.0093740Z #define cudaDeviceScheduleYield 0x02 2025-05-07T19:46:54.0094719Z #define cudaDeviceSyncMemops 0x80 2025-05-07T19:46:54.0094840Z #define cudaEventBlockingSync 0x01 2025-05-07T19:46:54.0094938Z #define cudaEventDefault 0x00 2025-05-07T19:46:54.0095049Z #define cudaEventDisableTiming 0x02 2025-05-07T19:46:54.0095228Z #define cudaEventInterprocess 0x04 2025-05-07T19:46:54.0095358Z #define cudaEventRecordDefault 0x00 2025-05-07T19:46:54.0095468Z #define cudaEventRecordExternal 0x01 2025-05-07T19:46:54.0095573Z #define cudaEventWaitDefault 0x00 2025-05-07T19:46:54.0095693Z #define cudaEventWaitExternal 0x01 2025-05-07T19:46:54.0095806Z #define cudaExternalMemoryDedicated 0x1 2025-05-07T19:46:54.0096005Z #define cudaExternalSemaphoreSignalSkipNvSciBufMemSync 0x01 2025-05-07T19:46:54.0096189Z #define cudaExternalSemaphoreWaitSkipNvSciBufMemSync 0x02 2025-05-07T19:46:54.0096382Z #define cudaGetDeviceProperties cudaGetDeviceProperties_v2 2025-05-07T19:46:54.0096506Z #define cudaGraphKernelNodePortDefault 0 2025-05-07T19:46:54.0096657Z #define cudaGraphKernelNodePortLaunchCompletion 2 2025-05-07T19:46:54.0096809Z #define cudaGraphKernelNodePortProgrammatic 1 2025-05-07T19:46:54.0096914Z #define cudaHostAllocDefault 0x00 2025-05-07T19:46:54.0097022Z #define cudaHostAllocMapped 0x02 2025-05-07T19:46:54.0097220Z #define cudaHostAllocPortable 0x01 2025-05-07T19:46:54.0097335Z #define cudaHostAllocWriteCombined 0x04 2025-05-07T19:46:54.0097442Z #define cudaHostRegisterDefault 0x00 2025-05-07T19:46:54.0097550Z #define cudaHostRegisterIoMemory 0x04 2025-05-07T19:46:54.0097671Z #define cudaHostRegisterMapped 0x02 2025-05-07T19:46:54.0097777Z #define cudaHostRegisterPortable 0x01 2025-05-07T19:46:54.0097888Z #define cudaHostRegisterReadOnly 0x08 2025-05-07T19:46:54.0098016Z #define cudaInitDeviceFlagsAreValid 0x01 2025-05-07T19:46:54.0098119Z #define cudaInvalidDeviceId ((int)-2) 2025-05-07T19:46:54.0098246Z #define cudaIpcMemLazyEnablePeerAccess 0x01 2025-05-07T19:46:54.0098391Z #define cudaKernelNodeAttrID cudaLaunchAttributeID 2025-05-07T19:46:54.0098579Z #define cudaKernelNodeAttrValue cudaLaunchAttributeValue 2025-05-07T19:46:54.0098913Z #define cudaKernelNodeAttributeAccessPolicyWindow cudaLaunchAttributeAccessPolicyWindow 2025-05-07T19:46:54.0099223Z #define cudaKernelNodeAttributeClusterDimension cudaLaunchAttributeClusterDimension 2025-05-07T19:46:54.0099739Z #define cudaKernelNodeAttributeClusterSchedulingPolicyPreference cudaLaunchAttributeClusterSchedulingPolicyPreference 2025-05-07T19:46:54.0099995Z #define cudaKernelNodeAttributeCooperative cudaLaunchAttributeCooperative 2025-05-07T19:46:54.0100402Z #define cudaKernelNodeAttributeDeviceUpdatableKernelNode cudaLaunchAttributeDeviceUpdatableKernelNode 2025-05-07T19:46:54.0100689Z #define cudaKernelNodeAttributeMemSyncDomain cudaLaunchAttributeMemSyncDomain 2025-05-07T19:46:54.0100995Z #define cudaKernelNodeAttributeMemSyncDomainMap cudaLaunchAttributeMemSyncDomainMap 2025-05-07T19:46:54.0101451Z #define cudaKernelNodeAttributePreferredSharedMemoryCarveout cudaLaunchAttributePreferredSharedMemoryCarveout 2025-05-07T19:46:54.0101695Z #define cudaKernelNodeAttributePriority cudaLaunchAttributePriority 2025-05-07T19:46:54.0101800Z #define cudaMemAttachGlobal 0x01 2025-05-07T19:46:54.0101901Z #define cudaMemAttachHost 0x02 2025-05-07T19:46:54.0102006Z #define cudaMemAttachSingle 0x04 2025-05-07T19:46:54.0102162Z #define cudaMemPoolCreateUsageHwDecompress 0x2 2025-05-07T19:46:54.0102268Z #define cudaNvSciSyncAttrSignal 0x1 2025-05-07T19:46:54.0102370Z #define cudaNvSciSyncAttrWait 0x2 2025-05-07T19:46:54.0102488Z #define cudaOccupancyDefault 0x00 2025-05-07T19:46:54.0102633Z #define cudaOccupancyDisableCachingOverride 0x01 2025-05-07T19:46:54.0102739Z #define cudaPeerAccessDefault 0x00 2025-05-07T19:46:54.0103093Z #define cudaSignalExternalSemaphoresAsync __CUDART_API_PTSZ(cudaSignalExternalSemaphoresAsync_v2) 2025-05-07T19:46:54.0103240Z #define cudaStreamAttrID cudaLaunchAttributeID 2025-05-07T19:46:54.0103388Z #define cudaStreamAttrValue cudaLaunchAttributeValue 2025-05-07T19:46:54.0103691Z #define cudaStreamAttributeAccessPolicyWindow cudaLaunchAttributeAccessPolicyWindow 2025-05-07T19:46:54.0104014Z #define cudaStreamAttributeMemSyncDomain cudaLaunchAttributeMemSyncDomain 2025-05-07T19:46:54.0104307Z #define cudaStreamAttributeMemSyncDomainMap cudaLaunchAttributeMemSyncDomainMap 2025-05-07T19:46:54.0104512Z #define cudaStreamAttributePriority cudaLaunchAttributePriority 2025-05-07T19:46:54.0104862Z #define cudaStreamAttributeSynchronizationPolicy cudaLaunchAttributeSynchronizationPolicy 2025-05-07T19:46:54.0104965Z #define cudaStreamDefault 0x00 2025-05-07T19:46:54.0105107Z #define cudaStreamFireAndForget ((cudaStream_t)0x4) 2025-05-07T19:46:54.0105387Z #define cudaStreamGetCaptureInfo __CUDART_API_PTSZ(cudaStreamGetCaptureInfo_v2) 2025-05-07T19:46:54.0105609Z #define cudaStreamGraphFireAndForget (cudaStream_t)0x0200000000000000 2025-05-07T19:46:54.0105872Z #define cudaStreamGraphFireAndForgetAsSibling (cudaStream_t)0x0300000000000000 2025-05-07T19:46:54.0106073Z #define cudaStreamGraphTailLaunch (cudaStream_t)0x0100000000000000 2025-05-07T19:46:54.0106214Z #define cudaStreamLegacy ((cudaStream_t)0x1) 2025-05-07T19:46:54.0106324Z #define cudaStreamNonBlocking 0x01 2025-05-07T19:46:54.0106457Z #define cudaStreamPerThread ((cudaStream_t)0x2) 2025-05-07T19:46:54.0106648Z #define cudaStreamTailLaunch ((cudaStream_t)0x3) 2025-05-07T19:46:54.0106752Z #define cudaSurfaceType1D 0x01 2025-05-07T19:46:54.0106866Z #define cudaSurfaceType1DLayered 0xF1 2025-05-07T19:46:54.0106992Z #define cudaSurfaceType2D 0x02 2025-05-07T19:46:54.0107105Z #define cudaSurfaceType2DLayered 0xF2 2025-05-07T19:46:54.0107207Z #define cudaSurfaceType3D 0x03 2025-05-07T19:46:54.0107318Z #define cudaSurfaceTypeCubemap 0x0C 2025-05-07T19:46:54.0107565Z #define cudaSurfaceTypeCubemapLayered 0xFC 2025-05-07T19:46:54.0107662Z #define cudaTextureType1D 0x01 2025-05-07T19:46:54.0107768Z #define cudaTextureType1DLayered 0xF1 2025-05-07T19:46:54.0107875Z #define cudaTextureType2D 0x02 2025-05-07T19:46:54.0107977Z #define cudaTextureType2DLayered 0xF2 2025-05-07T19:46:54.0108079Z #define cudaTextureType3D 0x03 2025-05-07T19:46:54.0108182Z #define cudaTextureTypeCubemap 0x0C 2025-05-07T19:46:54.0108315Z #define cudaTextureTypeCubemapLayered 0xFC 2025-05-07T19:46:54.0108630Z #define cudaWaitExternalSemaphoresAsync __CUDART_API_PTSZ(cudaWaitExternalSemaphoresAsync_v2) 2025-05-07T19:46:54.0108725Z #define getc(_fp) _IO_getc (_fp) 2025-05-07T19:46:54.0108832Z #define htobe16(x) __bswap_16 (x) 2025-05-07T19:46:54.0108927Z #define htobe32(x) __bswap_32 (x) 2025-05-07T19:46:54.0109019Z #define htobe64(x) __bswap_64 (x) 2025-05-07T19:46:54.0109102Z #define htole16(x) (x) 2025-05-07T19:46:54.0109205Z #define htole32(x) (x) 2025-05-07T19:46:54.0109286Z #define htole64(x) (x) 2025-05-07T19:46:54.0109397Z #define isalnum_l(c,l) __isalnum_l ((c), (l)) 2025-05-07T19:46:54.0109525Z #define isalpha_l(c,l) __isalpha_l ((c), (l)) 2025-05-07T19:46:54.0109618Z #define isascii(c) __isascii (c) 2025-05-07T19:46:54.0109729Z #define isascii_l(c,l) __isascii_l ((c), (l)) 2025-05-07T19:46:54.0109842Z #define isblank_l(c,l) __isblank_l ((c), (l)) 2025-05-07T19:46:54.0109966Z #define iscntrl_l(c,l) __iscntrl_l ((c), (l)) 2025-05-07T19:46:54.0110075Z #define isdigit_l(c,l) __isdigit_l ((c), (l)) 2025-05-07T19:46:54.0110186Z #define isgraph_l(c,l) __isgraph_l ((c), (l)) 2025-05-07T19:46:54.0110308Z #define islower_l(c,l) __islower_l ((c), (l)) 2025-05-07T19:46:54.0110418Z #define isprint_l(c,l) __isprint_l ((c), (l)) 2025-05-07T19:46:54.0110526Z #define ispunct_l(c,l) __ispunct_l ((c), (l)) 2025-05-07T19:46:54.0110649Z #define isspace_l(c,l) __isspace_l ((c), (l)) 2025-05-07T19:46:54.0110759Z #define isupper_l(c,l) __isupper_l ((c), (l)) 2025-05-07T19:46:54.0110877Z #define isxdigit_l(c,l) __isxdigit_l ((c), (l)) 2025-05-07T19:46:54.0110963Z #define le16toh(x) (x) 2025-05-07T19:46:54.0111066Z #define le32toh(x) (x) 2025-05-07T19:46:54.0111149Z #define le64toh(x) (x) 2025-05-07T19:46:54.0111231Z #define linux 1 2025-05-07T19:46:54.0111350Z #define major(dev) gnu_dev_major (dev) 2025-05-07T19:46:54.0111531Z #define makedev(maj,min) gnu_dev_makedev (maj, min) 2025-05-07T19:46:54.0111674Z #define math_errhandling (MATH_ERRNO | MATH_ERREXCEPT) 2025-05-07T19:46:54.0111775Z #define minor(dev) gnu_dev_minor (dev) 2025-05-07T19:46:54.0111909Z #define offsetof(t,d) __builtin_offsetof(t, d) 2025-05-07T19:46:54.0112014Z #define putc(_ch,_fp) _IO_putc (_ch, _fp) 2025-05-07T19:46:54.0112099Z #define stderr stderr 2025-05-07T19:46:54.0112201Z #define stdin stdin 2025-05-07T19:46:54.0112289Z #define stdout stdout 2025-05-07T19:46:54.0112757Z #define strdupa(s) (__extension__ ({ const char *__old = (s); size_t __len = strlen (__old) + 1; char *__new = (char *) __builtin_alloca (__len); (char *) memcpy (__new, __old, __len); })) 2025-05-07T19:46:54.0113292Z #define strndupa(s,n) (__extension__ ({ const char *__old = (s); size_t __len = strnlen (__old, (n)); char *__new = (char *) __builtin_alloca (__len + 1); __new[__len] = '\0'; (char *) memcpy (__new, __old, __len); })) 2025-05-07T19:46:54.0113387Z #define toascii(c) __toascii (c) 2025-05-07T19:46:54.0113500Z #define toascii_l(c,l) __toascii_l ((c), (l)) 2025-05-07T19:46:54.0113581Z #define unix 1 2025-05-07T19:46:54.0113721Z #define w_coredump __wait_terminated.__w_coredump 2025-05-07T19:46:54.0113897Z #define w_retcode __wait_terminated.__w_retcode 2025-05-07T19:46:54.0114007Z #define w_stopsig __wait_stopped.__w_stopsig 2025-05-07T19:46:54.0114125Z #define w_stopval __wait_stopped.__w_stopval 2025-05-07T19:46:54.0114239Z #define w_termsig __wait_terminated.__w_termsig 2025-05-07T19:46:54.0114246Z 2025-05-07T19:46:54.0387018Z 2025-05-07T19:46:54.0387733Z + conda run -n build_binary nvcc --version 2025-05-07T19:46:54.0387760Z 2025-05-07T19:46:55.8520146Z nvcc: NVIDIA (R) Cuda compiler driver 2025-05-07T19:46:55.8521188Z Copyright (c) 2005-2025 NVIDIA Corporation 2025-05-07T19:46:55.8522137Z Built on Wed_Jan_15_19:20:09_PST_2025 2025-05-07T19:46:55.8523067Z Cuda compilation tools, release 12.8, V12.8.61 2025-05-07T19:46:55.8524047Z Build cuda_12.8.r12.8/compiler.35404655_0 2025-05-07T19:46:55.8524714Z 2025-05-07T19:46:55.9081483Z 2025-05-07T19:46:55.9091790Z which: no nvidia-smi in (CONDA=/github/home/miniconda:/github/home/miniconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) 2025-05-07T19:46:55.9094551Z [CHECK] nvidia-smi not found 2025-05-07T19:46:55.9094898Z [INSTALL] Successfully installed CUDA 12.8.0 2025-05-07T19:46:55.9200644Z ##[group]Run . $PRELUDE; install_pytorch_pip $BUILD_ENV nightly cuda/12.8.0 2025-05-07T19:46:55.9201385Z . $PRELUDE; install_pytorch_pip $BUILD_ENV nightly cuda/12.8.0 2025-05-07T19:46:55.9202015Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:46:55.9202365Z env: 2025-05-07T19:46:55.9202619Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:46:55.9202930Z BUILD_ENV: build_binary 2025-05-07T19:46:55.9203213Z BUILD_TARGET: genai 2025-05-07T19:46:55.9203455Z BUILD_VARIANT: cuda 2025-05-07T19:46:55.9203725Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:46:55.9203983Z ##[endgroup] 2025-05-07T19:46:56.4089606Z ################################################################################ 2025-05-07T19:46:56.4090130Z # Install PyTorch (PIP) 2025-05-07T19:46:56.4090390Z # 2025-05-07T19:46:56.4102764Z # [2025-05-07T19:46:56.409Z] + install_pytorch_pip build_binary nightly cuda/12.8.0 2025-05-07T19:46:56.4103357Z ################################################################################ 2025-05-07T19:46:56.4103611Z 2025-05-07T19:46:56.4132072Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y numpy 2025-05-07T19:46:57.3599837Z Channels: 2025-05-07T19:46:57.3600160Z - conda-forge 2025-05-07T19:46:57.3600410Z Platform: linux-64 2025-05-07T19:47:00.3913521Z Collecting package metadata (repodata.json): - \ | / done 2025-05-07T19:47:02.1100140Z Solving environment: \ | / - done 2025-05-07T19:47:02.4279984Z 2025-05-07T19:47:02.4280207Z ## Package Plan ## 2025-05-07T19:47:02.4280461Z 2025-05-07T19:47:02.4280685Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:47:02.4281271Z 2025-05-07T19:47:02.4281372Z added / updated specs: 2025-05-07T19:47:02.4281636Z - numpy 2025-05-07T19:47:02.4281757Z 2025-05-07T19:47:02.4281762Z 2025-05-07T19:47:02.4281885Z The following packages will be downloaded: 2025-05-07T19:47:02.4282125Z 2025-05-07T19:47:02.4282254Z package | build 2025-05-07T19:47:02.4282597Z ---------------------------|----------------- 2025-05-07T19:47:02.4282993Z libblas-3.9.0 |31_h59b9bed_openblas 16 KB conda-forge 2025-05-07T19:47:02.4283479Z libcblas-3.9.0 |31_he106b2a_openblas 16 KB conda-forge 2025-05-07T19:47:02.4283953Z liblapack-3.9.0 |31_h7ac8fdf_openblas 16 KB conda-forge 2025-05-07T19:47:02.4284418Z numpy-2.2.5 | py313h17eae1a_0 8.1 MB conda-forge 2025-05-07T19:47:02.4284819Z ------------------------------------------------------------ 2025-05-07T19:47:02.4285189Z Total: 8.2 MB 2025-05-07T19:47:02.4285409Z 2025-05-07T19:47:02.4285557Z The following NEW packages will be INSTALLED: 2025-05-07T19:47:02.4285784Z 2025-05-07T19:47:02.4286015Z libblas conda-forge/linux-64::libblas-3.9.0-31_h59b9bed_openblas 2025-05-07T19:47:02.4286569Z libcblas conda-forge/linux-64::libcblas-3.9.0-31_he106b2a_openblas 2025-05-07T19:47:02.4287129Z liblapack conda-forge/linux-64::liblapack-3.9.0-31_h7ac8fdf_openblas 2025-05-07T19:47:02.4287633Z numpy conda-forge/linux-64::numpy-2.2.5-py313h17eae1a_0 2025-05-07T19:47:02.4287915Z 2025-05-07T19:47:02.4287919Z 2025-05-07T19:47:02.4287939Z 2025-05-07T19:47:02.4288088Z Downloading and Extracting Packages: ...working... 2025-05-07T19:47:02.4288467Z numpy-2.2.5 | 8.1 MB | | 0% 2025-05-07T19:47:02.4288718Z 2025-05-07T19:47:02.4289121Z libblas-3.9.0 | 16 KB | | 0%  2025-05-07T19:47:02.4289373Z 2025-05-07T19:47:02.4289377Z 2025-05-07T19:47:02.4314851Z libcblas-3.9.0 | 16 KB | | 0%  2025-05-07T19:47:02.4315144Z 2025-05-07T19:47:02.4315149Z 2025-05-07T19:47:02.4315153Z 2025-05-07T19:47:02.5464839Z liblapack-3.9.0 | 16 KB | | 0%  2025-05-07T19:47:02.5465157Z 2025-05-07T19:47:02.5466419Z 2025-05-07T19:47:02.5469962Z libcblas-3.9.0 | 16 KB | #########7 | 98%  2025-05-07T19:47:02.5470243Z 2025-05-07T19:47:02.5470247Z 2025-05-07T19:47:02.5470623Z 2025-05-07T19:47:02.5494304Z liblapack-3.9.0 | 16 KB | #########7 | 98%  2025-05-07T19:47:02.5494601Z 2025-05-07T19:47:02.5494606Z 2025-05-07T19:47:02.5505182Z libcblas-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:47:02.5505486Z 2025-05-07T19:47:02.5505491Z 2025-05-07T19:47:02.5512248Z 2025-05-07T19:47:02.6286070Z liblapack-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:47:02.6439670Z numpy-2.2.5 | 8.1 MB | | 0% 2025-05-07T19:47:02.6440049Z 2025-05-07T19:47:02.6440223Z 2025-05-07T19:47:02.6458049Z libcblas-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:47:02.6458349Z 2025-05-07T19:47:02.6458353Z 2025-05-07T19:47:02.6458357Z 2025-05-07T19:47:02.6726504Z liblapack-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:47:02.6727119Z 2025-05-07T19:47:02.6731374Z libblas-3.9.0 | 16 KB | #########7 | 97%  2025-05-07T19:47:02.6731640Z 2025-05-07T19:47:02.6950014Z libblas-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:47:02.6950390Z 2025-05-07T19:47:02.7225653Z libblas-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:47:03.0576730Z numpy-2.2.5 | 8.1 MB | ########## | 100% 2025-05-07T19:47:03.0577163Z numpy-2.2.5 | 8.1 MB | ########## | 100% 2025-05-07T19:47:03.0581008Z numpy-2.2.5 | 8.1 MB | ########## | 100% 2025-05-07T19:47:03.0581367Z 2025-05-07T19:47:03.0581818Z 2025-05-07T19:47:03.0582234Z  2025-05-07T19:47:03.0582452Z 2025-05-07T19:47:03.0582456Z 2025-05-07T19:47:03.0582626Z  2025-05-07T19:47:03.0582852Z 2025-05-07T19:47:03.0582858Z 2025-05-07T19:47:03.0582873Z 2025-05-07T19:47:03.0586066Z  done 2025-05-07T19:47:03.1592682Z Preparing transaction: | done 2025-05-07T19:47:03.3608287Z Verifying transaction: - \ done 2025-05-07T19:47:03.4618908Z Executing transaction: / done 2025-05-07T19:47:03.5714685Z ################################################################################ 2025-05-07T19:47:03.5715143Z # Install Package From PyTorch PIP: torch 2025-05-07T19:47:03.5715464Z # 2025-05-07T19:47:03.5740452Z # [2025-05-07T19:47:03.573Z] + install_from_pytorch_pip build_binary torch nightly cuda/12.8.0 2025-05-07T19:47:03.5741014Z ################################################################################ 2025-05-07T19:47:03.5741244Z 2025-05-07T19:47:03.5761742Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:47:03.6628530Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:47:03.6629644Z ################################################################################ 2025-05-07T19:47:03.6630654Z # Prepare PIP Arguments (PyTorch PIP) 2025-05-07T19:47:03.6631451Z # 2025-05-07T19:47:03.6650873Z # [2025-05-07T19:47:03.664Z] + __prepare_pip_arguments torch nightly cuda/12.8.0 2025-05-07T19:47:03.6652215Z ################################################################################ 2025-05-07T19:47:03.6652898Z 2025-05-07T19:47:03.6676618Z [INSTALL] Extracted package (channel, version): (nightly, LATEST) 2025-05-07T19:47:03.6704501Z [INSTALL] Extracted package variant: cu128 2025-05-07T19:47:03.6719290Z [INSTALL] Using a non-RELEASE channel: nightly ... 2025-05-07T19:47:03.6719898Z [INSTALL] Extracted the full PIP channel: https://download.pytorch.org/whl/nightly/cu128/ 2025-05-07T19:47:03.6727223Z [INSTALL] Extracted the full PIP package: --pre torch 2025-05-07T19:47:03.6739589Z [INSTALL] Attempting to install [torch, LATEST] from PyTorch PIP using channel https://download.pytorch.org/whl/nightly/cu128/ ... 2025-05-07T19:47:03.6764734Z [EXEC] [ATTEMPT 0/3] + conda run -n build_binary pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128/ 2025-05-07T19:48:53.9648633Z WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-05-07T19:48:53.9650174Z 2025-05-07T19:48:53.9650406Z Looking in indexes: https://download.pytorch.org/whl/nightly/cu128/ 2025-05-07T19:48:53.9650860Z Collecting torch 2025-05-07T19:48:53.9651575Z Downloading https://download.pytorch.org/whl/nightly/cu128/torch-2.8.0.dev20250507%2Bcu128-cp313-cp313-manylinux_2_28_x86_64.whl.metadata (30 kB) 2025-05-07T19:48:53.9652360Z Collecting filelock (from torch) 2025-05-07T19:48:53.9652947Z Downloading https://download.pytorch.org/whl/nightly/filelock-3.16.1-py3-none-any.whl (16 kB) 2025-05-07T19:48:53.9653984Z Requirement already satisfied: typing-extensions>=4.10.0 in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from torch) (4.13.2) 2025-05-07T19:48:53.9655151Z Requirement already satisfied: setuptools in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from torch) (78.1.1) 2025-05-07T19:48:53.9655978Z Collecting sympy>=1.13.3 (from torch) 2025-05-07T19:48:53.9656522Z Downloading https://download.pytorch.org/whl/nightly/sympy-1.13.3-py3-none-any.whl (6.2 MB) 2025-05-07T19:48:53.9657466Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.2/6.2 MB 27.4 MB/s eta 0:00:00 2025-05-07T19:48:53.9658200Z Collecting networkx (from torch) 2025-05-07T19:48:53.9658718Z Downloading https://download.pytorch.org/whl/nightly/networkx-3.4.2-py3-none-any.whl (1.7 MB) 2025-05-07T19:48:53.9659440Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 19.7 MB/s eta 0:00:00 2025-05-07T19:48:53.9660189Z Requirement already satisfied: jinja2 in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from torch) (3.1.6) 2025-05-07T19:48:53.9660902Z Collecting fsspec (from torch) 2025-05-07T19:48:53.9661436Z Downloading https://download.pytorch.org/whl/nightly/fsspec-2024.10.0-py3-none-any.whl (179 kB) 2025-05-07T19:48:53.9662145Z Collecting nvidia-cuda-nvrtc-cu12==12.8.61 (from torch) 2025-05-07T19:48:53.9662986Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_nvrtc_cu12-12.8.61-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB) 2025-05-07T19:48:53.9663811Z Collecting nvidia-cuda-runtime-cu12==12.8.57 (from torch) 2025-05-07T19:48:53.9664862Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_runtime_cu12-12.8.57-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) 2025-05-07T19:48:53.9665749Z Collecting nvidia-cuda-cupti-cu12==12.8.57 (from torch) 2025-05-07T19:48:53.9666604Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_cupti_cu12-12.8.57-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) 2025-05-07T19:48:53.9667460Z Collecting nvidia-cudnn-cu12==9.8.0.87 (from torch) 2025-05-07T19:48:53.9668190Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cudnn_cu12-9.8.0.87-py3-none-manylinux_2_27_x86_64.whl.metadata (1.8 kB) 2025-05-07T19:48:53.9668948Z Collecting nvidia-cublas-cu12==12.8.3.14 (from torch) 2025-05-07T19:48:53.9669706Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cublas_cu12-12.8.3.14-py3-none-manylinux_2_27_x86_64.whl.metadata (1.7 kB) 2025-05-07T19:48:53.9670451Z Collecting nvidia-cufft-cu12==11.3.3.41 (from torch) 2025-05-07T19:48:53.9671301Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cufft_cu12-11.3.3.41-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.5 kB) 2025-05-07T19:48:53.9672142Z Collecting nvidia-curand-cu12==10.3.9.55 (from torch) 2025-05-07T19:48:53.9673059Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_curand_cu12-10.3.9.55-py3-none-manylinux_2_27_x86_64.whl.metadata (1.5 kB) 2025-05-07T19:48:53.9673830Z Collecting nvidia-cusolver-cu12==11.7.2.55 (from torch) 2025-05-07T19:48:53.9674587Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusolver_cu12-11.7.2.55-py3-none-manylinux_2_27_x86_64.whl.metadata (1.6 kB) 2025-05-07T19:48:53.9675706Z Collecting nvidia-cusparse-cu12==12.5.7.53 (from torch) 2025-05-07T19:48:53.9676578Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusparse_cu12-12.5.7.53-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.6 kB) 2025-05-07T19:48:53.9677470Z Collecting nvidia-cusparselt-cu12==0.6.3 (from torch) 2025-05-07T19:48:53.9678254Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusparselt_cu12-0.6.3-py3-none-manylinux2014_x86_64.whl.metadata (6.8 kB) 2025-05-07T19:48:53.9679013Z Collecting nvidia-nccl-cu12==2.26.2 (from torch) 2025-05-07T19:48:53.9679850Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nccl_cu12-2.26.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.0 kB) 2025-05-07T19:48:53.9680678Z Collecting nvidia-nvtx-cu12==12.8.55 (from torch) 2025-05-07T19:48:53.9681579Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nvtx_cu12-12.8.55-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.6 kB) 2025-05-07T19:48:53.9682379Z Collecting nvidia-nvjitlink-cu12==12.8.61 (from torch) 2025-05-07T19:48:53.9683187Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nvjitlink_cu12-12.8.61-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB) 2025-05-07T19:48:53.9684283Z Collecting nvidia-cufile-cu12==1.13.0.11 (from torch) 2025-05-07T19:48:53.9685115Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cufile_cu12-1.13.0.11-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.5 kB) 2025-05-07T19:48:53.9685989Z Collecting pytorch-triton==3.3.0+git96316ce5 (from torch) 2025-05-07T19:48:53.9686874Z Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-3.3.0%2Bgit96316ce5-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (1.6 kB) 2025-05-07T19:48:53.9687732Z Collecting mpmath<1.4,>=1.1.0 (from sympy>=1.13.3->torch) 2025-05-07T19:48:53.9688317Z Downloading https://download.pytorch.org/whl/nightly/mpmath-1.3.0-py3-none-any.whl (536 kB) 2025-05-07T19:48:53.9689002Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 kB 2.2 MB/s eta 0:00:00 2025-05-07T19:48:53.9689795Z Requirement already satisfied: MarkupSafe>=2.0 in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from jinja2->torch) (3.0.2) 2025-05-07T19:48:53.9690957Z Downloading https://download.pytorch.org/whl/nightly/cu128/torch-2.8.0.dev20250507%2Bcu128-cp313-cp313-manylinux_2_28_x86_64.whl (1047.0 MB) 2025-05-07T19:48:53.9691834Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 GB 27.8 MB/s eta 0:00:00 2025-05-07T19:48:53.9692570Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cublas_cu12-12.8.3.14-py3-none-manylinux_2_27_x86_64.whl (609.6 MB) 2025-05-07T19:48:53.9693383Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 609.6/609.6 MB 40.1 MB/s eta 0:00:00 2025-05-07T19:48:53.9694209Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_cupti_cu12-12.8.57-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (10.2 MB) 2025-05-07T19:48:53.9695134Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.2/10.2 MB 37.6 MB/s eta 0:00:00 2025-05-07T19:48:53.9696186Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_nvrtc_cu12-12.8.61-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (88.0 MB) 2025-05-07T19:48:53.9697154Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88.0/88.0 MB 73.0 MB/s eta 0:00:00 2025-05-07T19:48:53.9698132Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_runtime_cu12-12.8.57-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (954 kB) 2025-05-07T19:48:53.9699092Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 954.8/954.8 kB 9.7 MB/s eta 0:00:00 2025-05-07T19:48:53.9699843Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cudnn_cu12-9.8.0.87-py3-none-manylinux_2_27_x86_64.whl (698.0 MB) 2025-05-07T19:48:53.9700673Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 698.0/698.0 MB 35.5 MB/s eta 0:00:00 2025-05-07T19:48:53.9701514Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cufft_cu12-11.3.3.41-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (193.1 MB) 2025-05-07T19:48:53.9702433Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 193.1/193.1 MB 82.9 MB/s eta 0:00:00 2025-05-07T19:48:53.9703282Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cufile_cu12-1.13.0.11-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.2 MB) 2025-05-07T19:48:53.9704203Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 13.4 MB/s eta 0:00:00 2025-05-07T19:48:53.9704944Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_curand_cu12-10.3.9.55-py3-none-manylinux_2_27_x86_64.whl (63.6 MB) 2025-05-07T19:48:53.9705783Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.6/63.6 MB 59.5 MB/s eta 0:00:00 2025-05-07T19:48:53.9706527Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusolver_cu12-11.7.2.55-py3-none-manylinux_2_27_x86_64.whl (260.4 MB) 2025-05-07T19:48:53.9707382Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 260.4/260.4 MB 65.7 MB/s eta 0:00:00 2025-05-07T19:48:53.9708301Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusparse_cu12-12.5.7.53-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (292.1 MB) 2025-05-07T19:48:53.9709236Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 292.1/292.1 MB 81.9 MB/s eta 0:00:00 2025-05-07T19:48:53.9709937Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusparselt_cu12-0.6.3-py3-none-manylinux2014_x86_64.whl (156.8 MB) 2025-05-07T19:48:53.9710723Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 156.8/156.8 MB 75.4 MB/s eta 0:00:00 2025-05-07T19:48:53.9711484Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nccl_cu12-2.26.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (201.3 MB) 2025-05-07T19:48:53.9712339Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 201.3/201.3 MB 68.0 MB/s eta 0:00:00 2025-05-07T19:48:53.9713110Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nvjitlink_cu12-12.8.61-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.2 MB) 2025-05-07T19:48:53.9713981Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.2/39.2 MB 58.9 MB/s eta 0:00:00 2025-05-07T19:48:53.9714745Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nvtx_cu12-12.8.55-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89 kB) 2025-05-07T19:48:53.9716086Z Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-3.3.0%2Bgit96316ce5-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (153.5 MB) 2025-05-07T19:48:53.9717324Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 153.5/153.5 MB 82.0 MB/s eta 0:00:00 2025-05-07T19:48:53.9719097Z Installing collected packages: nvidia-cusparselt-cu12, mpmath, sympy, pytorch-triton, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufile-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, networkx, fsspec, filelock, nvidia-cusparse-cu12, nvidia-cufft-cu12, nvidia-cudnn-cu12, nvidia-cusolver-cu12, torch 2025-05-07T19:48:53.9720750Z 2025-05-07T19:48:53.9722777Z Successfully installed filelock-3.16.1 fsspec-2024.10.0 mpmath-1.3.0 networkx-3.4.2 nvidia-cublas-cu12-12.8.3.14 nvidia-cuda-cupti-cu12-12.8.57 nvidia-cuda-nvrtc-cu12-12.8.61 nvidia-cuda-runtime-cu12-12.8.57 nvidia-cudnn-cu12-9.8.0.87 nvidia-cufft-cu12-11.3.3.41 nvidia-cufile-cu12-1.13.0.11 nvidia-curand-cu12-10.3.9.55 nvidia-cusolver-cu12-11.7.2.55 nvidia-cusparse-cu12-12.5.7.53 nvidia-cusparselt-cu12-0.6.3 nvidia-nccl-cu12-2.26.2 nvidia-nvjitlink-cu12-12.8.61 nvidia-nvtx-cu12-12.8.55 pytorch-triton-3.3.0+git96316ce5 sympy-1.13.3 torch-2.8.0.dev20250507+cu128 2025-05-07T19:48:53.9724915Z 2025-05-07T19:48:56.1135868Z torch 2.8.0.dev20250507+cu128 2025-05-07T19:48:56.1136488Z [CHECK] The installed package [torch, nightly/LATEST] is the correct variant (cu128) 2025-05-07T19:48:59.4686399Z [CHECK] Python (sub-)package 'torch.distributed' found ... 2025-05-07T19:49:02.8318836Z [CHECK] NOTE: The installed version is: 2.8.0.dev20250507+cu128 2025-05-07T19:49:02.8319362Z [CHECK] NOTE: Checking _GLIBCXX_USE_CXX11_ABI ... 2025-05-07T19:49:06.1538473Z True 2025-05-07T19:49:06.1539131Z True 2025-05-07T19:49:06.1539242Z 2025-05-07T19:49:06.2131012Z [INSTALL] Successfully installed PyTorch through PyTorch PIP 2025-05-07T19:49:06.2223569Z ##[group]Run if . $PRELUDE && which conda; then collect_pytorch_env_info $BUILD_ENV; fi 2025-05-07T19:49:06.2224228Z if . $PRELUDE && which conda; then collect_pytorch_env_info $BUILD_ENV; fi 2025-05-07T19:49:06.2224850Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:49:06.2225296Z env: 2025-05-07T19:49:06.2225617Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:49:06.2225916Z BUILD_ENV: build_binary 2025-05-07T19:49:06.2226165Z BUILD_TARGET: genai 2025-05-07T19:49:06.2226379Z BUILD_VARIANT: cuda 2025-05-07T19:49:06.2226611Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:49:06.2226844Z ##[endgroup] 2025-05-07T19:49:06.6565046Z /github/home/miniconda/bin/conda 2025-05-07T19:49:06.6566005Z ################################################################################ 2025-05-07T19:49:06.6567255Z # Collect PyTorch Environment Information (for Reporting Issues) 2025-05-07T19:49:06.6568397Z # 2025-05-07T19:49:06.6579177Z # [2025-05-07T19:49:06.657Z] + collect_pytorch_env_info build_binary 2025-05-07T19:49:06.6579720Z ################################################################################ 2025-05-07T19:49:06.6580002Z 2025-05-07T19:49:06.6596581Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:49:06.7457110Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:49:06.7462223Z [INFO] Downloading the PyTorch environment info collection script ... 2025-05-07T19:49:06.7462951Z + wget -q https://raw.githubusercontent.com/pytorch/pytorch/main/torch/utils/collect_env.py 2025-05-07T19:49:06.7463395Z 2025-05-07T19:49:06.8278461Z 2025-05-07T19:49:06.8279659Z [INFO] Collecting PyTorch environment info (will be needed for reporting issues to PyTorch) ... 2025-05-07T19:49:06.8298948Z [EXEC] [ATTEMPT 0/3] + conda run -n build_binary python collect_env.py 2025-05-07T19:49:12.4209392Z Collecting environment information... 2025-05-07T19:49:12.4209849Z PyTorch version: 2.8.0.dev20250507+cu128 2025-05-07T19:49:12.4210169Z Is debug build: False 2025-05-07T19:49:12.4210440Z CUDA used to build PyTorch: 12.8 2025-05-07T19:49:12.4210744Z ROCM used to build PyTorch: N/A 2025-05-07T19:49:12.4210974Z 2025-05-07T19:49:12.4211082Z OS: Amazon Linux 2023.7.20250428 (x86_64) 2025-05-07T19:49:12.4211408Z GCC version: Could not collect 2025-05-07T19:49:12.4212018Z Clang version: 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4) 2025-05-07T19:49:12.4212619Z CMake version: version 4.0.2 2025-05-07T19:49:12.4212902Z Libc version: glibc-2.34 2025-05-07T19:49:12.4213064Z 2025-05-07T19:49:12.4213381Z Python version: 3.13.2 | packaged by conda-forge | (main, Feb 17 2025, 14:10:22) [GCC 13.3.0] (64-bit runtime) 2025-05-07T19:49:12.4214046Z Python platform: Linux-6.1.130-139.222.amzn2023.x86_64-x86_64-with-glibc2.34 2025-05-07T19:49:12.4214482Z Is CUDA available: False 2025-05-07T19:49:12.4214769Z CUDA runtime version: 12.8.61 2025-05-07T19:49:12.4215054Z CUDA_MODULE_LOADING set to: N/A 2025-05-07T19:49:12.4215459Z GPU models and configuration: Could not collect 2025-05-07T19:49:12.4215826Z Nvidia driver version: Could not collect 2025-05-07T19:49:12.4216137Z cuDNN version: Could not collect 2025-05-07T19:49:12.4216426Z HIP runtime version: N/A 2025-05-07T19:49:12.4216683Z MIOpen runtime version: N/A 2025-05-07T19:49:12.4216959Z Is XNNPACK available: True 2025-05-07T19:49:12.4217123Z 2025-05-07T19:49:12.4217206Z CPU: 2025-05-07T19:49:12.4217439Z Architecture: x86_64 2025-05-07T19:49:12.4217782Z CPU op-mode(s): 32-bit, 64-bit 2025-05-07T19:49:12.4218202Z Address sizes: 46 bits physical, 48 bits virtual 2025-05-07T19:49:12.4218623Z Byte Order: Little Endian 2025-05-07T19:49:12.4218951Z CPU(s): 96 2025-05-07T19:49:12.4219273Z On-line CPU(s) list: 0-95 2025-05-07T19:49:12.4223280Z Vendor ID: GenuineIntel 2025-05-07T19:49:12.4223846Z Model name: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:49:12.4224219Z CPU family: 6 2025-05-07T19:49:12.4224681Z Model: 85 2025-05-07T19:49:12.4224985Z Thread(s) per core: 2 2025-05-07T19:49:12.4225278Z Core(s) per socket: 24 2025-05-07T19:49:12.4225580Z Socket(s): 2 2025-05-07T19:49:12.4226035Z Stepping: 7 2025-05-07T19:49:12.4226354Z BogoMIPS: 6000.01 2025-05-07T19:49:12.4228757Z Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:49:12.4231319Z Hypervisor vendor: KVM 2025-05-07T19:49:12.4231643Z Virtualization type: full 2025-05-07T19:49:12.4232010Z L1d cache: 1.5 MiB (48 instances) 2025-05-07T19:49:12.4232394Z L1i cache: 1.5 MiB (48 instances) 2025-05-07T19:49:12.4232790Z L2 cache: 48 MiB (48 instances) 2025-05-07T19:49:12.4233161Z L3 cache: 71.5 MiB (2 instances) 2025-05-07T19:49:12.4233514Z NUMA node(s): 2 2025-05-07T19:49:12.4233945Z NUMA node0 CPU(s): 0-23,48-71 2025-05-07T19:49:12.4234289Z NUMA node1 CPU(s): 24-47,72-95 2025-05-07T19:49:12.4234768Z Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status 2025-05-07T19:49:12.4235322Z Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported 2025-05-07T19:49:12.4235823Z Vulnerability L1tf: Mitigation; PTE Inversion 2025-05-07T19:49:12.4236419Z Vulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown 2025-05-07T19:49:12.4237007Z Vulnerability Meltdown: Mitigation; PTI 2025-05-07T19:49:12.4237627Z Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown 2025-05-07T19:49:12.4238230Z Vulnerability Reg file data sampling: Not affected 2025-05-07T19:49:12.4238618Z Vulnerability Retbleed: Vulnerable 2025-05-07T19:49:12.4238985Z Vulnerability Spec rstack overflow: Not affected 2025-05-07T19:49:12.4239369Z Vulnerability Spec store bypass: Vulnerable 2025-05-07T19:49:12.4239927Z Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization 2025-05-07T19:49:12.4240768Z Vulnerability Spectre v2: Mitigation; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Retpoline 2025-05-07T19:49:12.4241424Z Vulnerability Srbds: Not affected 2025-05-07T19:49:12.4241791Z Vulnerability Tsx async abort: Not affected 2025-05-07T19:49:12.4242030Z 2025-05-07T19:49:12.4242149Z Versions of relevant libraries: 2025-05-07T19:49:12.4242417Z [pip3] numpy==2.2.5 2025-05-07T19:49:12.4242675Z [pip3] nvidia-cublas-cu12==12.8.3.14 2025-05-07T19:49:12.4242975Z [pip3] nvidia-cuda-cupti-cu12==12.8.57 2025-05-07T19:49:12.4243294Z [pip3] nvidia-cuda-nvrtc-cu12==12.8.61 2025-05-07T19:49:12.4243601Z [pip3] nvidia-cuda-runtime-cu12==12.8.57 2025-05-07T19:49:12.4243921Z [pip3] nvidia-cudnn-cu12==9.8.0.87 2025-05-07T19:49:12.4244218Z [pip3] nvidia-cufft-cu12==11.3.3.41 2025-05-07T19:49:12.4244591Z [pip3] nvidia-curand-cu12==10.3.9.55 2025-05-07T19:49:12.4244897Z [pip3] nvidia-cusolver-cu12==11.7.2.55 2025-05-07T19:49:12.4245311Z [pip3] nvidia-cusparse-cu12==12.5.7.53 2025-05-07T19:49:12.4245636Z [pip3] nvidia-cusparselt-cu12==0.6.3 2025-05-07T19:49:12.4245928Z [pip3] nvidia-nccl-cu12==2.26.2 2025-05-07T19:49:12.4246223Z [pip3] nvidia-nvjitlink-cu12==12.8.61 2025-05-07T19:49:12.4246522Z [pip3] nvidia-nvtx-cu12==12.8.55 2025-05-07T19:49:12.4246825Z [pip3] pytorch-triton==3.3.0+git96316ce5 2025-05-07T19:49:12.4247144Z [pip3] torch==2.8.0.dev20250507+cu128 2025-05-07T19:49:12.4247519Z [conda] cuda-cudart 12.8.57 h5888daf_1 conda-forge 2025-05-07T19:49:12.4248032Z [conda] cuda-cudart-dev 12.8.57 h5888daf_1 conda-forge 2025-05-07T19:49:12.4248561Z [conda] cuda-cudart-dev_linux-64 12.8.57 h3f2d84a_1 conda-forge 2025-05-07T19:49:12.4249113Z [conda] cuda-cudart-static 12.8.57 h5888daf_1 conda-forge 2025-05-07T19:49:12.4249668Z [conda] cuda-cudart-static_linux-64 12.8.57 h3f2d84a_1 conda-forge 2025-05-07T19:49:12.4250234Z [conda] cuda-cudart_linux-64 12.8.57 h3f2d84a_1 conda-forge 2025-05-07T19:49:12.4250743Z [conda] cuda-cupti 12.8.57 hbd13f7d_0 conda-forge 2025-05-07T19:49:12.4251219Z [conda] cuda-cupti-dev 12.8.57 h5888daf_0 conda-forge 2025-05-07T19:49:12.4251728Z [conda] cuda-libraries 12.8.0 ha770c72_0 conda-forge 2025-05-07T19:49:12.4252235Z [conda] cuda-libraries-dev 12.8.0 ha770c72_0 conda-forge 2025-05-07T19:49:12.4252738Z [conda] cuda-nvrtc 12.8.61 hbd13f7d_0 conda-forge 2025-05-07T19:49:12.4253213Z [conda] cuda-nvrtc-dev 12.8.61 h5888daf_0 conda-forge 2025-05-07T19:49:12.4253701Z [conda] cuda-nvtx 12.8.55 hbd13f7d_0 conda-forge 2025-05-07T19:49:12.4254180Z [conda] cuda-opencl 12.8.55 hbd13f7d_0 conda-forge 2025-05-07T19:49:12.4254678Z [conda] cuda-opencl-dev 12.8.55 h5888daf_0 conda-forge 2025-05-07T19:49:12.4255265Z [conda] cuda-runtime 12.8.0 ha804496_0 conda-forge 2025-05-07T19:49:12.4255915Z [conda] libcublas 12.8.3.14 h9ab20c4_0 conda-forge 2025-05-07T19:49:12.4256425Z [conda] libcublas-dev 12.8.3.14 h9ab20c4_0 conda-forge 2025-05-07T19:49:12.4256938Z [conda] libcufft 11.3.3.41 hbd13f7d_0 conda-forge 2025-05-07T19:49:12.4257420Z [conda] libcufft-dev 11.3.3.41 h5888daf_0 conda-forge 2025-05-07T19:49:12.4257927Z [conda] libcurand 10.3.9.55 hbd13f7d_0 conda-forge 2025-05-07T19:49:12.4258420Z [conda] libcurand-dev 10.3.9.55 h5888daf_0 conda-forge 2025-05-07T19:49:12.4258929Z [conda] libcusolver 11.7.2.55 h9ab20c4_0 conda-forge 2025-05-07T19:49:12.4259446Z [conda] libcusolver-dev 11.7.2.55 h9ab20c4_0 conda-forge 2025-05-07T19:49:12.4259967Z [conda] libcusparse 12.5.7.53 hbd13f7d_0 conda-forge 2025-05-07T19:49:12.4260491Z [conda] libcusparse-dev 12.5.7.53 h5888daf_0 conda-forge 2025-05-07T19:49:12.4261000Z [conda] libnvjitlink 12.8.61 hbd13f7d_0 conda-forge 2025-05-07T19:49:12.4261526Z [conda] libnvjitlink-dev 12.8.61 h5888daf_0 conda-forge 2025-05-07T19:49:12.4262011Z [conda] numpy 2.2.5 py313h17eae1a_0 conda-forge 2025-05-07T19:49:12.4262509Z [conda] nvidia-cublas-cu12 12.8.3.14 pypi_0 pypi 2025-05-07T19:49:12.4263133Z [conda] nvidia-cuda-cupti-cu12 12.8.57 pypi_0 pypi 2025-05-07T19:49:12.4263655Z [conda] nvidia-cuda-nvrtc-cu12 12.8.61 pypi_0 pypi 2025-05-07T19:49:12.4264267Z [conda] nvidia-cuda-runtime-cu12 12.8.57 pypi_0 pypi 2025-05-07T19:49:12.4264835Z [conda] nvidia-cudnn-cu12 9.8.0.87 pypi_0 pypi 2025-05-07T19:49:12.4265337Z [conda] nvidia-cufft-cu12 11.3.3.41 pypi_0 pypi 2025-05-07T19:49:12.4265826Z [conda] nvidia-curand-cu12 10.3.9.55 pypi_0 pypi 2025-05-07T19:49:12.4266337Z [conda] nvidia-cusolver-cu12 11.7.2.55 pypi_0 pypi 2025-05-07T19:49:12.4266855Z [conda] nvidia-cusparse-cu12 12.5.7.53 pypi_0 pypi 2025-05-07T19:49:12.4267367Z [conda] nvidia-cusparselt-cu12 0.6.3 pypi_0 pypi 2025-05-07T19:49:12.4267885Z [conda] nvidia-nccl-cu12 2.26.2 pypi_0 pypi 2025-05-07T19:49:12.4268380Z [conda] nvidia-nvjitlink-cu12 12.8.61 pypi_0 pypi 2025-05-07T19:49:12.4268885Z [conda] nvidia-nvtx-cu12 12.8.55 pypi_0 pypi 2025-05-07T19:49:12.4269373Z [conda] pytorch-triton 3.3.0+git96316ce5 pypi_0 pypi 2025-05-07T19:49:12.4269862Z [conda] torch 2.8.0.dev20250507+cu128 pypi_0 pypi 2025-05-07T19:49:12.4270143Z 2025-05-07T19:49:12.5122748Z ##[group]Run . $PRELUDE; install_cudnn $BUILD_ENV "$(pwd)/build_only/cudnn" 12.8.0 2025-05-07T19:49:12.5123371Z . $PRELUDE; install_cudnn $BUILD_ENV "$(pwd)/build_only/cudnn" 12.8.0 2025-05-07T19:49:12.5124139Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:49:12.5124469Z env: 2025-05-07T19:49:12.5124685Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:49:12.5125002Z BUILD_ENV: build_binary 2025-05-07T19:49:12.5125243Z BUILD_TARGET: genai 2025-05-07T19:49:12.5125481Z BUILD_VARIANT: cuda 2025-05-07T19:49:12.5125707Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:49:12.5125965Z ##[endgroup] 2025-05-07T19:49:12.9524847Z ################################################################################ 2025-05-07T19:49:12.9525344Z # Install cuDNN 2025-05-07T19:49:12.9525581Z # 2025-05-07T19:49:12.9538316Z # [2025-05-07T19:49:12.953Z] + install_cudnn build_binary /__w/FBGEMM/FBGEMM/build_only/cudnn 12.8.0 2025-05-07T19:49:12.9539968Z ################################################################################ 2025-05-07T19:49:12.9540655Z 2025-05-07T19:49:12.9554050Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:49:13.0379076Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:49:13.0380483Z [INSTALL] cuda_concat_version is determined to be: 128 2025-05-07T19:49:13.0382228Z [INSTALL] Could not find cuDNN URL for the given cuda_concat_version 128; defaulting to cuDNN for CUDA 11.8 2025-05-07T19:49:13.0383005Z + rm -rf /__w/FBGEMM/FBGEMM/build_only/cudnn 2025-05-07T19:49:13.0383230Z 2025-05-07T19:49:13.0395256Z 2025-05-07T19:49:13.0395918Z + mkdir -p /__w/FBGEMM/FBGEMM/build_only/cudnn 2025-05-07T19:49:13.0396681Z 2025-05-07T19:49:13.0409864Z 2025-05-07T19:49:13.0425723Z [INSTALL] Downloading cuDNN to /tmp/tmp.nrQ6CGFT9L ... 2025-05-07T19:49:13.0448000Z [EXEC] [ATTEMPT 0/3] + wget -q https://developer.download.nvidia.com/compute/redist/cudnn/v8.7.0/local_installers/11.8/cudnn-linux-x86_64-8.7.0.84_cuda11-archive.tar.xz -O cudnn.tar.xz 2025-05-07T19:49:29.1347069Z [INSTALL] Unpacking cuDNN ... 2025-05-07T19:49:29.1347473Z + tar -xvf cudnn.tar.xz 2025-05-07T19:49:29.1347640Z 2025-05-07T19:49:29.1378097Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/ 2025-05-07T19:49:29.1379196Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/ 2025-05-07T19:49:29.1380495Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_infer_static.a 2025-05-07T19:49:31.5299149Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_infer_static_v8.a 2025-05-07T19:49:31.5300850Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_train_static.a 2025-05-07T19:49:33.7914293Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_train_static_v8.a 2025-05-07T19:49:33.7914912Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_infer_static.a 2025-05-07T19:49:42.0254959Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_infer_static_v8.a 2025-05-07T19:49:42.0255864Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_train_static.a 2025-05-07T19:49:43.6222165Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_train_static_v8.a 2025-05-07T19:49:43.6223888Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_infer_static.a 2025-05-07T19:49:45.3159411Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_infer_static_v8.a 2025-05-07T19:49:45.3161129Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_train_static.a 2025-05-07T19:49:46.8327194Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_train_static_v8.a 2025-05-07T19:49:46.8327751Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn.so.8 2025-05-07T19:49:46.8328218Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn.so 2025-05-07T19:49:46.8328694Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn.so.8.7.0 2025-05-07T19:49:46.8339483Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_infer.so.8 2025-05-07T19:49:46.8341607Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_infer.so 2025-05-07T19:49:46.8343171Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_infer.so.8.7.0 2025-05-07T19:49:49.2180826Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_train.so.8 2025-05-07T19:49:49.2182159Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_train.so 2025-05-07T19:49:49.2182752Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_train.so.8.7.0 2025-05-07T19:49:51.4789101Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_infer.so 2025-05-07T19:49:51.4790706Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_infer.so.8 2025-05-07T19:49:51.4792249Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_infer.so.8.7.0 2025-05-07T19:50:00.0323187Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_train.so 2025-05-07T19:50:00.0323928Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_train.so.8.7.0 2025-05-07T19:50:01.6559631Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_train.so.8 2025-05-07T19:50:01.6561262Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_infer.so.8.7.0 2025-05-07T19:50:03.3476404Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_infer.so 2025-05-07T19:50:03.3477025Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_infer.so.8 2025-05-07T19:50:03.3477566Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_train.so.8.7.0 2025-05-07T19:50:04.8652919Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_train.so 2025-05-07T19:50:04.8653549Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_train.so.8 2025-05-07T19:50:04.8654076Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/ 2025-05-07T19:50:04.8654550Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_v8.h 2025-05-07T19:50:04.8655156Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_adv_infer_v8.h 2025-05-07T19:50:04.8655845Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_adv_train_v8.h 2025-05-07T19:50:04.8656437Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_backend_v8.h 2025-05-07T19:50:04.8657008Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_cnn_infer_v8.h 2025-05-07T19:50:04.8657550Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_cnn_train_v8.h 2025-05-07T19:50:04.8658114Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_ops_infer_v8.h 2025-05-07T19:50:04.8658651Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_ops_train_v8.h 2025-05-07T19:50:04.8659223Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_version_v8.h 2025-05-07T19:50:04.8659747Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn.h 2025-05-07T19:50:04.8660239Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_adv_infer.h 2025-05-07T19:50:04.8660784Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_adv_train.h 2025-05-07T19:50:04.8661582Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_backend.h 2025-05-07T19:50:04.8662158Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_cnn_infer.h 2025-05-07T19:50:04.8662706Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_cnn_train.h 2025-05-07T19:50:04.8663227Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_ops_infer.h 2025-05-07T19:50:04.8663779Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_ops_train.h 2025-05-07T19:50:04.8664293Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_version.h 2025-05-07T19:50:04.8664783Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/LICENSE 2025-05-07T19:50:04.8676050Z 2025-05-07T19:50:04.8676569Z [INSTALL] Moving cuDNN files to /__w/FBGEMM/FBGEMM/build_only/cudnn ... 2025-05-07T19:50:04.8677072Z + rm -rf /__w/FBGEMM/FBGEMM/build_only/cudnn/include 2025-05-07T19:50:04.8677333Z 2025-05-07T19:50:04.8695435Z 2025-05-07T19:50:04.8696692Z + rm -rf /__w/FBGEMM/FBGEMM/build_only/cudnn/lib 2025-05-07T19:50:04.8697044Z 2025-05-07T19:50:04.8709346Z 2025-05-07T19:50:04.8709842Z + mv cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include /__w/FBGEMM/FBGEMM/build_only/cudnn 2025-05-07T19:50:04.8710798Z 2025-05-07T19:50:04.8739895Z 2025-05-07T19:50:04.8741059Z + mv cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib /__w/FBGEMM/FBGEMM/build_only/cudnn 2025-05-07T19:50:04.8742224Z 2025-05-07T19:50:06.5641547Z 2025-05-07T19:50:06.5642097Z /__w/FBGEMM/FBGEMM 2025-05-07T19:50:06.5642912Z + rm -rf /tmp/tmp.nrQ6CGFT9L 2025-05-07T19:50:06.5643448Z 2025-05-07T19:50:06.6135106Z 2025-05-07T19:50:06.6139851Z [INSTALL] Set environment variables CUDNN_INCLUDE_DIR and CUDNN_LIBRARY ... 2025-05-07T19:50:06.6141155Z + conda env config vars set -n build_binary CUDNN_INCLUDE_DIR=/__w/FBGEMM/FBGEMM/build_only/cudnn/include CUDNN_LIBRARY=/__w/FBGEMM/FBGEMM/build_only/cudnn/lib 2025-05-07T19:50:06.6141838Z 2025-05-07T19:50:07.0425691Z 2025-05-07T19:50:07.0426952Z [INSTALL] Successfully installed cuDNN (for CUDA 12.8.0) 2025-05-07T19:50:07.0500424Z ##[group]Run . $PRELUDE; cd fbgemm_gpu; prepare_fbgemm_gpu_build $BUILD_ENV 2025-05-07T19:50:07.0501080Z . $PRELUDE; cd fbgemm_gpu; prepare_fbgemm_gpu_build $BUILD_ENV 2025-05-07T19:50:07.0501872Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:50:07.0502236Z env: 2025-05-07T19:50:07.0502472Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:50:07.0502819Z BUILD_ENV: build_binary 2025-05-07T19:50:07.0503080Z BUILD_TARGET: genai 2025-05-07T19:50:07.0503366Z BUILD_VARIANT: cuda 2025-05-07T19:50:07.0503633Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:50:07.0503886Z ##[endgroup] 2025-05-07T19:50:07.5076411Z ################################################################################ 2025-05-07T19:50:07.5076837Z # Prepare FBGEMM-GPU Build 2025-05-07T19:50:07.5077095Z # 2025-05-07T19:50:07.5100028Z # [2025-05-07T19:50:07.509Z] + prepare_fbgemm_gpu_build build_binary 2025-05-07T19:50:07.5101400Z ################################################################################ 2025-05-07T19:50:07.5102099Z 2025-05-07T19:50:07.5118796Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:50:07.5997359Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:50:07.6012736Z [BUILD] Running git submodules update ... 2025-05-07T19:50:07.6034255Z [EXEC] [ATTEMPT 0/3] + git submodule sync 2025-05-07T19:50:07.6346765Z Synchronizing submodule url for '../external/asmjit' 2025-05-07T19:50:07.6347266Z Synchronizing submodule url for '../external/composable_kernel' 2025-05-07T19:50:07.6347755Z Synchronizing submodule url for '../external/cpuinfo' 2025-05-07T19:50:07.6348196Z Synchronizing submodule url for '../external/cutlass' 2025-05-07T19:50:07.6348629Z Synchronizing submodule url for '../external/googletest' 2025-05-07T19:50:07.6361019Z Synchronizing submodule url for '../external/hipify_torch' 2025-05-07T19:50:07.6361525Z Synchronizing submodule url for '../external/json' 2025-05-07T19:50:07.6384018Z [EXEC] [ATTEMPT 0/3] + git submodule update --init --recursive 2025-05-07T19:50:07.6858233Z [BUILD] Installing other build dependencies ... 2025-05-07T19:50:07.6880401Z [EXEC] [ATTEMPT 0/3] + conda run --no-capture-output -n build_binary python -m pip install -r requirements.txt 2025-05-07T19:50:09.7879363Z Collecting backports.tarfile (from -r requirements.txt (line 13)) 2025-05-07T19:50:09.8054605Z Downloading backports.tarfile-1.2.0-py3-none-any.whl.metadata (2.0 kB) 2025-05-07T19:50:09.8133761Z Requirement already satisfied: build in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from -r requirements.txt (line 14)) (1.2.2.post1) 2025-05-07T19:50:09.9121514Z Collecting cmake (from -r requirements.txt (line 15)) 2025-05-07T19:50:09.9146141Z Downloading cmake-4.0.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.3 kB) 2025-05-07T19:50:09.9218068Z Requirement already satisfied: click in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from -r requirements.txt (line 16)) (8.1.8) 2025-05-07T19:50:09.9222118Z Requirement already satisfied: hypothesis in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from -r requirements.txt (line 17)) (6.131.14) 2025-05-07T19:50:09.9223388Z Requirement already satisfied: jinja2 in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from -r requirements.txt (line 18)) (3.1.6) 2025-05-07T19:50:09.9224608Z Requirement already satisfied: mpmath==1.3.0 in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from -r requirements.txt (line 19)) (1.3.0) 2025-05-07T19:50:09.9512840Z Collecting ninja (from -r requirements.txt (line 20)) 2025-05-07T19:50:09.9542301Z Downloading ninja-1.11.1.4-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.metadata (5.0 kB) 2025-05-07T19:50:09.9616729Z Requirement already satisfied: numpy>=2.0.2 in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from -r requirements.txt (line 21)) (2.2.5) 2025-05-07T19:50:09.9758393Z Collecting pyre-extensions (from -r requirements.txt (line 22)) 2025-05-07T19:50:09.9782481Z Downloading pyre_extensions-0.0.32-py3-none-any.whl.metadata (4.0 kB) 2025-05-07T19:50:09.9854446Z Requirement already satisfied: pyyaml in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from -r requirements.txt (line 23)) (6.0.2) 2025-05-07T19:50:09.9858688Z Requirement already satisfied: scikit-build in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from -r requirements.txt (line 24)) (0.18.1) 2025-05-07T19:50:09.9862715Z Requirement already satisfied: setuptools in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from -r requirements.txt (line 25)) (78.1.1) 2025-05-07T19:50:10.0050619Z Collecting setuptools_git_versioning (from -r requirements.txt (line 26)) 2025-05-07T19:50:10.0084462Z Downloading setuptools_git_versioning-2.1.0-py3-none-any.whl.metadata (6.1 kB) 2025-05-07T19:50:10.0261716Z Collecting tabulate (from -r requirements.txt (line 27)) 2025-05-07T19:50:10.0284239Z Downloading tabulate-0.9.0-py3-none-any.whl.metadata (34 kB) 2025-05-07T19:50:10.0528868Z Collecting patchelf (from -r requirements.txt (line 28)) 2025-05-07T19:50:10.0553341Z Downloading patchelf-0.17.2.2-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.musllinux_1_1_x86_64.whl.metadata (3.5 kB) 2025-05-07T19:50:10.0643395Z Requirement already satisfied: packaging>=19.1 in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from build->-r requirements.txt (line 14)) (25.0) 2025-05-07T19:50:10.0644774Z Requirement already satisfied: pyproject_hooks in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from build->-r requirements.txt (line 14)) (1.2.0) 2025-05-07T19:50:10.0694249Z Requirement already satisfied: attrs>=22.2.0 in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from hypothesis->-r requirements.txt (line 17)) (25.3.0) 2025-05-07T19:50:10.0698972Z Requirement already satisfied: sortedcontainers<3.0.0,>=2.1.0 in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from hypothesis->-r requirements.txt (line 17)) (2.4.0) 2025-05-07T19:50:10.0741577Z Requirement already satisfied: MarkupSafe>=2.0 in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from jinja2->-r requirements.txt (line 18)) (3.0.2) 2025-05-07T19:50:10.0879451Z Collecting typing-inspect (from pyre-extensions->-r requirements.txt (line 22)) 2025-05-07T19:50:10.0904569Z Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB) 2025-05-07T19:50:10.0975490Z Requirement already satisfied: typing-extensions in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from pyre-extensions->-r requirements.txt (line 22)) (4.13.2) 2025-05-07T19:50:10.0985251Z Requirement already satisfied: distro in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from scikit-build->-r requirements.txt (line 24)) (1.9.0) 2025-05-07T19:50:10.1000698Z Requirement already satisfied: wheel>=0.32.0 in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from scikit-build->-r requirements.txt (line 24)) (0.45.1) 2025-05-07T19:50:10.1346040Z Collecting mypy-extensions>=0.3.0 (from typing-inspect->pyre-extensions->-r requirements.txt (line 22)) 2025-05-07T19:50:10.1384857Z Downloading mypy_extensions-1.1.0-py3-none-any.whl.metadata (1.1 kB) 2025-05-07T19:50:10.1483624Z Downloading backports.tarfile-1.2.0-py3-none-any.whl (30 kB) 2025-05-07T19:50:10.1567367Z Downloading cmake-4.0.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27.9 MB) 2025-05-07T19:50:10.2683749Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 27.9/27.9 MB 255.0 MB/s eta 0:00:00 2025-05-07T19:50:10.2723705Z Downloading ninja-1.11.1.4-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (422 kB) 2025-05-07T19:50:10.2809652Z Downloading pyre_extensions-0.0.32-py3-none-any.whl (12 kB) 2025-05-07T19:50:10.2876289Z Downloading setuptools_git_versioning-2.1.0-py3-none-any.whl (10 kB) 2025-05-07T19:50:10.2936133Z Downloading tabulate-0.9.0-py3-none-any.whl (35 kB) 2025-05-07T19:50:10.2996067Z Downloading patchelf-0.17.2.2-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.musllinux_1_1_x86_64.whl (466 kB) 2025-05-07T19:50:10.3094066Z Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB) 2025-05-07T19:50:10.3146995Z Downloading mypy_extensions-1.1.0-py3-none-any.whl (5.0 kB) 2025-05-07T19:50:10.4663718Z Installing collected packages: tabulate, setuptools_git_versioning, patchelf, ninja, mypy-extensions, cmake, backports.tarfile, typing-inspect, pyre-extensions 2025-05-07T19:50:11.2922163Z 2025-05-07T19:50:11.2971693Z WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-05-07T19:50:11.2973850Z Successfully installed backports.tarfile-1.2.0 cmake-4.0.0 mypy-extensions-1.1.0 ninja-1.11.1.4 patchelf-0.17.2.2 pyre-extensions-0.0.32 setuptools_git_versioning-2.1.0 tabulate-0.9.0 typing-inspect-0.9.0 2025-05-07T19:50:11.4387417Z ################################################################################ 2025-05-07T19:50:11.4387889Z # Install PyTorch (PyTorch PIP) 2025-05-07T19:50:11.4388223Z # 2025-05-07T19:50:11.4408856Z # [2025-05-07T19:50:11.440Z] + install_triton_pip build_binary 2025-05-07T19:50:11.4409344Z ################################################################################ 2025-05-07T19:50:11.4409589Z 2025-05-07T19:50:11.4409838Z [BUILD] Installing pytorch-triton nightly/3.2.0+git4b3bb1f8 from PIP ... 2025-05-07T19:50:11.4410326Z ################################################################################ 2025-05-07T19:50:11.4410704Z # Install Package From PyTorch PIP: pytorch-triton 2025-05-07T19:50:11.4411087Z # 2025-05-07T19:50:11.4427982Z # [2025-05-07T19:50:11.442Z] + install_from_pytorch_pip build_binary pytorch-triton nightly/3.2.0+git4b3bb1f8 2025-05-07T19:50:11.4429663Z ################################################################################ 2025-05-07T19:50:11.4430354Z 2025-05-07T19:50:11.4453625Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:50:11.5303236Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:50:11.5303653Z ################################################################################ 2025-05-07T19:50:11.5304027Z # Prepare PIP Arguments (PyTorch PIP) 2025-05-07T19:50:11.5304321Z # 2025-05-07T19:50:11.5325721Z # [2025-05-07T19:50:11.531Z] + __prepare_pip_arguments pytorch-triton nightly/3.2.0+git4b3bb1f8 2025-05-07T19:50:11.5326255Z ################################################################################ 2025-05-07T19:50:11.5326506Z 2025-05-07T19:50:11.5377326Z [INSTALL] Extracted package (channel, version): (nightly, 3.2.0+git4b3bb1f8) 2025-05-07T19:50:11.5388954Z [INSTALL] Using a non-RELEASE channel: nightly ... 2025-05-07T19:50:11.5390492Z [INSTALL] Extracted the full PIP channel: https://download.pytorch.org/whl/nightly/ 2025-05-07T19:50:11.5393421Z [INSTALL] Extracted the full PIP package: --pre pytorch-triton==3.2.0+git4b3bb1f8 2025-05-07T19:50:11.5401951Z [INSTALL] Attempting to install [pytorch-triton, 3.2.0+git4b3bb1f8] from PyTorch PIP using channel https://download.pytorch.org/whl/nightly/ ... 2025-05-07T19:50:11.5423435Z [EXEC] [ATTEMPT 0/3] + conda run -n build_binary pip install --pre pytorch-triton==3.2.0+git4b3bb1f8 --index-url https://download.pytorch.org/whl/nightly/ 2025-05-07T19:50:16.9668675Z ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. 2025-05-07T19:50:16.9670020Z torch 2.8.0.dev20250507+cu128 requires pytorch-triton==3.3.0+git96316ce5; platform_system == "Linux", but you have pytorch-triton 3.2.0+git4b3bb1f8 which is incompatible. 2025-05-07T19:50:16.9671330Z Looking in indexes: https://download.pytorch.org/whl/nightly/ 2025-05-07T19:50:16.9671776Z Collecting pytorch-triton==3.2.0+git4b3bb1f8 2025-05-07T19:50:16.9672662Z Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-3.2.0%2Bgit4b3bb1f8-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (1.3 kB) 2025-05-07T19:50:16.9674006Z Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-3.2.0%2Bgit4b3bb1f8-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (166.5 MB) 2025-05-07T19:50:16.9675494Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 166.5/166.5 MB 198.9 MB/s eta 0:00:00 2025-05-07T19:50:16.9675923Z Installing collected packages: pytorch-triton 2025-05-07T19:50:16.9676284Z Attempting uninstall: pytorch-triton 2025-05-07T19:50:16.9676708Z Found existing installation: pytorch-triton 3.3.0+git96316ce5 2025-05-07T19:50:16.9677170Z Uninstalling pytorch-triton-3.3.0+git96316ce5: 2025-05-07T19:50:16.9677624Z Successfully uninstalled pytorch-triton-3.3.0+git96316ce5 2025-05-07T19:50:16.9679294Z WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-05-07T19:50:16.9680959Z 2025-05-07T19:50:16.9681139Z Successfully installed pytorch-triton-3.2.0+git4b3bb1f8 2025-05-07T19:50:16.9681397Z 2025-05-07T19:50:19.1052888Z [CHECK] Python (sub-)package 'triton' found ... 2025-05-07T19:50:19.1054122Z [CHECK] Printing out the pytorch-triton version ... 2025-05-07T19:50:21.1383302Z ################################################################################ 2025-05-07T19:50:21.1384554Z [CHECK] The installed VERSION of pytorch-triton is: 3.2.0 2025-05-07T19:50:21.1385748Z ################################################################################ 2025-05-07T19:50:21.1386979Z 2025-05-07T19:50:23.1114508Z [CHECK] Python (sub-)package 'numpy' found ... 2025-05-07T19:50:25.1952346Z [CHECK] Python (sub-)package 'skbuild' found ... 2025-05-07T19:50:25.1953559Z [BUILD] Successfully ran git submodules update 2025-05-07T19:50:25.2043305Z ##[group]Run . $PRELUDE; cd fbgemm_gpu; build_fbgemm_gpu_package $BUILD_ENV nightly genai/cuda 2025-05-07T19:50:25.2044014Z . $PRELUDE; cd fbgemm_gpu; build_fbgemm_gpu_package $BUILD_ENV nightly genai/cuda 2025-05-07T19:50:25.2044643Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:50:25.2044970Z env: 2025-05-07T19:50:25.2045228Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:50:25.2045533Z BUILD_ENV: build_binary 2025-05-07T19:50:25.2045803Z BUILD_TARGET: genai 2025-05-07T19:50:25.2046033Z BUILD_VARIANT: cuda 2025-05-07T19:50:25.2046306Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:50:25.2046558Z ##[endgroup] 2025-05-07T19:50:25.6410641Z [BUILD] BUILD_TARGET_VARIANT: genai/cuda 2025-05-07T19:50:25.6411031Z [BUILD] Extracted build target: genai 2025-05-07T19:50:25.6411372Z [BUILD] Extracted build variant: cuda 2025-05-07T19:50:27.4529186Z /github/home/miniconda/envs/build_binary/bin/cc 2025-05-07T19:50:27.4529508Z 2025-05-07T19:50:27.5277869Z [CHECK] Binary cc found in PATH 2025-05-07T19:50:29.3585263Z /github/home/miniconda/envs/build_binary/bin/gcc 2025-05-07T19:50:29.3585565Z 2025-05-07T19:50:29.4172981Z [CHECK] Binary gcc found in PATH 2025-05-07T19:50:31.2407630Z /github/home/miniconda/envs/build_binary/bin/c++ 2025-05-07T19:50:31.2407982Z 2025-05-07T19:50:31.3171705Z [CHECK] Binary c++ found in PATH 2025-05-07T19:50:33.1469243Z /github/home/miniconda/envs/build_binary/bin/g++ 2025-05-07T19:50:33.1469596Z 2025-05-07T19:50:33.2215767Z [CHECK] Binary g++ found in PATH 2025-05-07T19:50:35.1031059Z [BUILD] Extracted and set Python tag: py313 2025-05-07T19:50:35.1031637Z [BUILD] Extracted and set Python platform name: manylinux_2_28_x86_64 2025-05-07T19:50:35.1285207Z core = 24 2025-05-07T19:50:35.1492001Z sockets = 2 2025-05-07T19:50:35.1492918Z [BUILD] Set multicore run option for setup.py: -j 48 2025-05-07T19:50:35.1493967Z [CHECK] LD_LIBRARY_PATH = 2025-05-07T19:50:35.1494742Z [BUILD] Running pre-build cleanups ... 2025-05-07T19:50:35.1495860Z + rm -rf dist 2025-05-07T19:50:35.1496217Z 2025-05-07T19:50:35.1509957Z 2025-05-07T19:50:35.1510540Z + conda run --no-capture-output -n build_binary python setup.py clean 2025-05-07T19:50:35.1510895Z 2025-05-07T19:50:38.2915899Z INFO:root:running clean 2025-05-07T19:50:38.2916265Z [SETUP.PY] ARGV: ['setup.py', 'clean'] 2025-05-07T19:50:38.2917335Z [SETUP.PY] Parsed setup.py arguments: Namespace(verbose=False, debug=False, dryrun=False, build_target='default', build_variant='cuda', package_channel='nightly', nvml_lib_path=None, nccl_lib_path=None, use_fb_only=False, cxxprefix=None) 2025-05-07T19:50:38.2918408Z [SETUP.PY] Other arguments: ['clean'] 2025-05-07T19:50:38.2918881Z [SETUP.PY] CUDA CUB directory environment variable not set. Using default CUB location. 2025-05-07T19:50:38.2919491Z [SETUP.PY] Using CUDA = /github/home/miniconda/envs/build_binary 2025-05-07T19:50:38.2920066Z [SETUP.PY] Generating version file at: /__w/FBGEMM/FBGEMM/fbgemm_gpu/fbgemm_gpu/docs/version.py 2025-05-07T19:50:38.2920648Z [SETUP.PY] Setting the FBGEMM build target: default ... 2025-05-07T19:50:38.2921067Z [SETUP.PY] Setting the FBGEMM build variant: cuda ... 2025-05-07T19:50:38.2922331Z [SETUP.PY] Passing CMake arguments: ['-DCMAKE_PREFIX_PATH=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch', '-D_GLIBCXX_USE_CXX11_ABI=1', '-DFBGEMM_BUILD_TARGET=default', '-DFBGEMM_BUILD_VARIANT=cuda', "-DCMAKE_C_FLAGS=''", "-DCMAKE_CXX_FLAGS=''"] 2025-05-07T19:50:38.6838193Z 2025-05-07T19:50:38.6838917Z [BUILD] Printing git status ... 2025-05-07T19:50:38.6839475Z + git status 2025-05-07T19:50:38.6839606Z 2025-05-07T19:50:39.2327488Z HEAD detached at pull/4066/merge 2025-05-07T19:50:39.2328628Z Untracked files: 2025-05-07T19:50:39.2328943Z (use "git add ..." to include in what will be committed) 2025-05-07T19:50:39.2329333Z ../build_only/ 2025-05-07T19:50:39.2329574Z ../collect_env.py 2025-05-07T19:50:39.2329816Z fbgemm_gpu/docs/version.py 2025-05-07T19:50:39.2329993Z 2025-05-07T19:50:39.2330466Z nothing added to commit but untracked files present (use "git add" to track) 2025-05-07T19:50:39.2330825Z 2025-05-07T19:50:39.2330917Z + git diff 2025-05-07T19:50:39.2331057Z 2025-05-07T19:50:39.2607386Z 2025-05-07T19:50:39.2607956Z ################################################################################ 2025-05-07T19:50:39.2609013Z # Configure FBGEMM-GPU Build 2025-05-07T19:50:39.2609601Z # 2025-05-07T19:50:39.2624989Z # [2025-05-07T19:50:39.262Z] + __configure_fbgemm_gpu_build 2025-05-07T19:50:39.2625873Z ################################################################################ 2025-05-07T19:50:39.2630009Z 2025-05-07T19:50:39.2630398Z [BUILD] Setting the build target: genai ... 2025-05-07T19:50:39.2631762Z [BUILD] Configuring build as CUDA variant (this is the default behavior) ... 2025-05-07T19:50:41.0930636Z /github/home/miniconda/envs/build_binary/bin/nvcc 2025-05-07T19:50:41.0931117Z 2025-05-07T19:50:41.1522436Z [CHECK] Binary nvcc found in PATH 2025-05-07T19:50:42.9568940Z /__w/FBGEMM/FBGEMM/build_only/cudnn/include 2025-05-07T19:50:42.9569628Z 2025-05-07T19:50:43.0128904Z [CHECK] Environment variable CUDNN_INCLUDE_DIR is defined in the Conda environment 2025-05-07T19:50:44.8378490Z /__w/FBGEMM/FBGEMM/build_only/cudnn/lib 2025-05-07T19:50:44.8954307Z 2025-05-07T19:50:44.8955287Z [CHECK] Environment variable CUDNN_LIBRARY is defined in the Conda environment 2025-05-07T19:50:46.7310488Z /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:50:46.7310887Z 2025-05-07T19:50:46.8079015Z [CHECK] Environment variable NVML_LIB_PATH is defined in the Conda environment 2025-05-07T19:50:48.6770539Z [BUILD] Using the default architectures for CUDA nvcc: NVIDIA (R) Cuda compiler driver 2025-05-07T19:50:48.6772164Z Copyright (c) 2005-2025 NVIDIA Corporation 2025-05-07T19:50:48.6773101Z Built on Wed_Jan_15_19:20:09_PST_2025 2025-05-07T19:50:48.6774051Z Cuda compilation tools, release 12.8, V12.8.61 2025-05-07T19:50:48.6775688Z Build cuda_12.8.r12.8/compiler.35404655_0 ... 2025-05-07T19:50:48.6776957Z [BUILD] Setting the following CUDA targets: 7.0;8.0;9.0;9.0a;10.0a;12.0a 2025-05-07T19:50:48.6778180Z [BUILD] Looking up NVML filepath ... 2025-05-07T19:50:50.5572163Z [BUILD] Looking up NCCL filepath ... 2025-05-07T19:50:54.3627941Z [BUILD] Setting NVCC verbose mode ... 2025-05-07T19:50:54.3629067Z + conda env config vars set -n build_binary NVCC_VERBOSE=1 2025-05-07T19:50:54.3629377Z 2025-05-07T19:50:54.7700914Z 2025-05-07T19:50:54.7701220Z [BUILD] Setting CUDA build args ... 2025-05-07T19:50:56.6457776Z [BUILD] Looking up CUDA version ... 2025-05-07T19:51:00.4081524Z + conda run -n build_binary c++ --version | grep -i clang 2025-05-07T19:51:00.4081833Z 2025-05-07T19:51:02.3377847Z clang version 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4) 2025-05-07T19:51:02.3380604Z Configuration file: /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-clang++.cfg 2025-05-07T19:51:02.3381106Z 2025-05-07T19:51:02.3381254Z [BUILD] Setting NVCC flags ... 2025-05-07T19:51:02.3382209Z + conda env config vars set -n build_binary NVCC_PREPEND_FLAGS="-std=c++20 -Xcompiler -std=c++20 -Xcompiler -stdlib=libstdc++ -ccbin /github/home/miniconda/envs/build_binary/bin/c++ -allow-unsupported-compiler" 2025-05-07T19:51:02.3383056Z 2025-05-07T19:51:02.7663907Z 2025-05-07T19:51:02.7664338Z + conda run -n build_binary printenv NVCC_PREPEND_FLAGS 2025-05-07T19:51:02.7664662Z 2025-05-07T19:51:04.6005558Z -std=c++20 -Xcompiler -std=c++20 -Xcompiler -stdlib=libstdc++ -ccbin /github/home/miniconda/envs/build_binary/bin/c++ -allow-unsupported-compiler 2025-05-07T19:51:04.6007537Z 2025-05-07T19:51:04.6746532Z 2025-05-07T19:51:04.6748205Z [BUILD] Setting CUDA build args ... 2025-05-07T19:51:04.6749249Z + conda run -n build_binary c++ --version 2025-05-07T19:51:04.6749880Z 2025-05-07T19:51:06.5252386Z clang version 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4) 2025-05-07T19:51:06.5253388Z Target: x86_64-conda-linux-gnu 2025-05-07T19:51:06.5253709Z Thread model: posix 2025-05-07T19:51:06.5254044Z InstalledDir: /github/home/miniconda/envs/build_binary/bin 2025-05-07T19:51:06.5254693Z Configuration file: /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-clang++.cfg 2025-05-07T19:51:06.5255157Z 2025-05-07T19:51:06.5986701Z 2025-05-07T19:51:06.5987159Z + conda run -n build_binary c++ --version | grep -i clang 2025-05-07T19:51:06.5987505Z 2025-05-07T19:51:08.5166673Z clang version 16.0.6 (https://github.com/conda-forge/clangdev-feedstock db6970f6bb85e49860ed8bab43ebf165b5c55cc4) 2025-05-07T19:51:08.5167642Z Configuration file: /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-clang++.cfg 2025-05-07T19:51:08.5168144Z 2025-05-07T19:51:08.5168356Z [BUILD] Clang is available; configuring for Clang-based build ... 2025-05-07T19:51:10.4164394Z .github/scripts/fbgemm_gpu_build.bash: line 370: [: : integer expression expected 2025-05-07T19:51:10.4165249Z [BUILD] Enabling debug features in the build ... 2025-05-07T19:51:10.4167739Z [BUILD] FBGEMM_GPU build arguments have been set: --verbose --build-target=genai --build-variant=cuda --nvml_lib_path=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so --nccl_lib_path=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2 -DTORCH_CUDA_ARCH_LIST='7.0;8.0;9.0;9.0a;10.0a;12.0a' -DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCMAKE_CXX_STANDARD=20 --cxxprefix=/github/home/miniconda/envs/build_binary --debug 2025-05-07T19:51:10.4170239Z ################################################################################ 2025-05-07T19:51:10.4170565Z # Build FBGEMM-GPU Package (Wheel) 2025-05-07T19:51:10.4170854Z # 2025-05-07T19:51:10.4184227Z # [2025-05-07T19:51:10.417Z] + build_fbgemm_gpu_package build_binary nightly genai/cuda 2025-05-07T19:51:10.4186775Z ################################################################################ 2025-05-07T19:51:10.4187476Z 2025-05-07T19:51:10.4188001Z [BUILD] Building FBGEMM wheel (TARGET=genai, VARIANT=cuda) ... 2025-05-07T19:51:10.4195192Z + conda run --no-capture-output -n build_binary python -m build --wheel --no-isolation --config-setting=--build-option=--verbose --config-setting=--build-option=--build-target=genai --config-setting=--build-option=--build-variant=cuda --config-setting=--build-option=--nvml_lib_path=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so --config-setting=--build-option=--nccl_lib_path=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2 --config-setting=--build-option=-DTORCH_CUDA_ARCH_LIST='7.0;8.0;9.0;9.0a;10.0a;12.0a' --config-setting=--build-option=-DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux --config-setting=--build-option=-DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux --config-setting=--build-option=-DCMAKE_CXX_STANDARD=20 --config-setting=--build-option=--cxxprefix=/github/home/miniconda/envs/build_binary --config-setting=--build-option=--debug --config-setting=--build-option=--package_channel=nightly --config-setting=--build-option=--python-tag=py313 --config-setting=--build-option=--plat-name=manylinux_2_28_x86_64 2025-05-07T19:51:10.4199893Z 2025-05-07T19:51:12.2750740Z * Getting build dependencies for wheel... 2025-05-07T19:51:13.6045757Z INFO:root:running egg_info 2025-05-07T19:51:13.6083327Z INFO:root:creating fbgemm_gpu_nightly.egg-info 2025-05-07T19:51:13.6084107Z INFO:root:writing fbgemm_gpu_nightly.egg-info/PKG-INFO 2025-05-07T19:51:13.6087409Z INFO:root:writing dependency_links to fbgemm_gpu_nightly.egg-info/dependency_links.txt 2025-05-07T19:51:13.6088871Z INFO:root:writing requirements to fbgemm_gpu_nightly.egg-info/requires.txt 2025-05-07T19:51:13.6089871Z INFO:root:writing top-level names to fbgemm_gpu_nightly.egg-info/top_level.txt 2025-05-07T19:51:13.6091116Z INFO:root:writing manifest file 'fbgemm_gpu_nightly.egg-info/SOURCES.txt' 2025-05-07T19:51:13.6157093Z INFO:root:reading manifest file 'fbgemm_gpu_nightly.egg-info/SOURCES.txt' 2025-05-07T19:51:13.6166802Z INFO:root:writing manifest file 'fbgemm_gpu_nightly.egg-info/SOURCES.txt' 2025-05-07T19:51:13.6170112Z [SETUP.PY] ARGV: ['setup.py', 'egg_info'] 2025-05-07T19:51:13.6171473Z [SETUP.PY] Parsed setup.py arguments: Namespace(verbose=False, debug=False, dryrun=False, build_target='default', build_variant='cuda', package_channel='nightly', nvml_lib_path=None, nccl_lib_path=None, use_fb_only=False, cxxprefix=None) 2025-05-07T19:51:13.6172553Z [SETUP.PY] Other arguments: ['egg_info'] 2025-05-07T19:51:13.6173078Z [SETUP.PY] CUDA CUB directory environment variable not set. Using default CUB location. 2025-05-07T19:51:13.6173645Z [SETUP.PY] Using CUDA = /github/home/miniconda/envs/build_binary 2025-05-07T19:51:13.6174250Z [SETUP.PY] Generating version file at: /__w/FBGEMM/FBGEMM/fbgemm_gpu/fbgemm_gpu/docs/version.py 2025-05-07T19:51:13.6174999Z [SETUP.PY] Setting the FBGEMM build target: default ... 2025-05-07T19:51:13.6175480Z [SETUP.PY] Setting the FBGEMM build variant: cuda ... 2025-05-07T19:51:13.6176762Z [SETUP.PY] Passing CMake arguments: ['-DCMAKE_PREFIX_PATH=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch', '-D_GLIBCXX_USE_CXX11_ABI=1', '-DFBGEMM_BUILD_TARGET=default', '-DFBGEMM_BUILD_VARIANT=cuda', "-DCMAKE_C_FLAGS=''", "-DCMAKE_CXX_FLAGS=''"] 2025-05-07T19:51:13.9662563Z * Building wheel... 2025-05-07T19:51:15.2886569Z [SETUP.PY] ARGV: ['setup.py', 'bdist_wheel', '--dist-dir', '/__w/FBGEMM/FBGEMM/fbgemm_gpu/dist/.tmp-99kfrfao', '--verbose', '--build-target=genai', '--build-variant=cuda', '--nvml_lib_path=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so', '--nccl_lib_path=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2', '-DTORCH_CUDA_ARCH_LIST=7.0;8.0;9.0;9.0a;10.0a;12.0a', '-DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCMAKE_CXX_STANDARD=20', '--cxxprefix=/github/home/miniconda/envs/build_binary', '--debug', '--package_channel=nightly', '--python-tag=py313', '--plat-name=manylinux_2_28_x86_64'] 2025-05-07T19:51:15.2891396Z [SETUP.PY] Parsed setup.py arguments: Namespace(verbose=True, debug=True, dryrun=False, build_target='genai', build_variant='cuda', package_channel='nightly', nvml_lib_path='/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so', nccl_lib_path='/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2', use_fb_only=False, cxxprefix='/github/home/miniconda/envs/build_binary') 2025-05-07T19:51:15.2894705Z [SETUP.PY] Other arguments: ['bdist_wheel', '--dist-dir', '/__w/FBGEMM/FBGEMM/fbgemm_gpu/dist/.tmp-99kfrfao', '-DTORCH_CUDA_ARCH_LIST=7.0;8.0;9.0;9.0a;10.0a;12.0a', '-DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCMAKE_CXX_STANDARD=20', '--python-tag=py313', '--plat-name=manylinux_2_28_x86_64'] 2025-05-07T19:51:15.2896695Z [SETUP.PY] CUDA CUB directory environment variable not set. Using default CUB location. 2025-05-07T19:51:15.2897253Z [SETUP.PY] Using CUDA = /github/home/miniconda/envs/build_binary 2025-05-07T19:51:15.2897838Z [SETUP.PY] Generating version file at: /__w/FBGEMM/FBGEMM/fbgemm_gpu/fbgemm_gpu/docs/version.py 2025-05-07T19:51:15.2898382Z [SETUP.PY] Setting the FBGEMM build target: genai ... 2025-05-07T19:51:15.2898789Z [SETUP.PY] Setting the FBGEMM build variant: cuda ... 2025-05-07T19:51:15.2905528Z [SETUP.PY] Passing CMake arguments: ['-DCMAKE_PREFIX_PATH=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch', '-D_GLIBCXX_USE_CXX11_ABI=1', '-DCMAKE_VERBOSE_MAKEFILE=ON', '-DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE', '-DFBGEMM_BUILD_TARGET=genai', '-DFBGEMM_BUILD_VARIANT=cuda', '-DNVML_LIB_PATH=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so', '-DNCCL_INCLUDE_DIRS=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include', '-DNCCL_LIBRARIES=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2', '-DCMAKE_C_COMPILER=/github/home/miniconda/envs/build_binary/bin/cc', '-DCMAKE_CXX_COMPILER=/github/home/miniconda/envs/build_binary/bin/c++', "-DCMAKE_C_FLAGS='-DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include'", "-DCMAKE_CXX_FLAGS='-DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include'", '-DTORCH_CUDA_ARCH_LIST=7.0;8.0;9.0;9.0a;10.0a;12.0a', '-DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCMAKE_CXX_STANDARD=20'] 2025-05-07T19:51:15.2911704Z 2025-05-07T19:51:15.2911708Z 2025-05-07T19:51:15.2911868Z -------------------------------------------------------------------------------- 2025-05-07T19:51:15.2912236Z -- Trying 'Ninja' generator 2025-05-07T19:51:15.2912485Z -------------------------------- 2025-05-07T19:51:15.2912746Z --------------------------- 2025-05-07T19:51:15.2912975Z ---------------------- 2025-05-07T19:51:15.2913198Z ----------------- 2025-05-07T19:51:15.2913402Z ------------ 2025-05-07T19:51:15.2913611Z ------- 2025-05-07T19:51:15.2913789Z -- 2025-05-07T19:51:15.3299050Z CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required): 2025-05-07T19:51:15.3299621Z Not searching for unused variables given on the command line. 2025-05-07T19:51:15.3300186Z Compatibility with CMake < 3.10 will be removed from a future version of 2025-05-07T19:51:15.3300597Z CMake. 2025-05-07T19:51:15.3300728Z 2025-05-07T19:51:15.3300951Z Update the VERSION argument value. Or, use the ... syntax 2025-05-07T19:51:15.3301499Z to tell CMake that the project requires at least but has been updated 2025-05-07T19:51:15.3301986Z to work with policies introduced by or earlier. 2025-05-07T19:51:15.3302241Z 2025-05-07T19:51:15.3302246Z 2025-05-07T19:51:15.4074132Z -- The C compiler identification is Clang 16.0.6 2025-05-07T19:51:15.4161857Z -- Detecting C compiler ABI info 2025-05-07T19:51:15.5350362Z -- Detecting C compiler ABI info - done 2025-05-07T19:51:15.5480269Z -- Check for working C compiler: /github/home/miniconda/envs/build_binary/bin/cc - skipped 2025-05-07T19:51:15.5482531Z -- Detecting C compile features 2025-05-07T19:51:15.5486712Z -- Detecting C compile features - done 2025-05-07T19:51:15.6843123Z -- The CXX compiler identification is Clang 16.0.6 2025-05-07T19:51:15.6914788Z -- Detecting CXX compiler ABI info 2025-05-07T19:51:15.8172062Z -- Detecting CXX compiler ABI info - done 2025-05-07T19:51:15.8307836Z -- Check for working CXX compiler: /github/home/miniconda/envs/build_binary/bin/c++ - skipped 2025-05-07T19:51:15.8310609Z -- Detecting CXX compile features 2025-05-07T19:51:15.8318629Z -- Detecting CXX compile features - done 2025-05-07T19:51:15.8334569Z -- Configuring done (0.5s) 2025-05-07T19:51:15.8382033Z -- Generating done (0.0s) 2025-05-07T19:51:15.8392576Z -- Build files have been written to: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_cmake_test_compile/build 2025-05-07T19:51:15.8441093Z -- 2025-05-07T19:51:15.8441416Z ------- 2025-05-07T19:51:15.8441624Z ------------ 2025-05-07T19:51:15.8442073Z ----------------- 2025-05-07T19:51:15.8442292Z ---------------------- 2025-05-07T19:51:15.8442554Z --------------------------- 2025-05-07T19:51:15.8442805Z -------------------------------- 2025-05-07T19:51:15.8443101Z -- Trying 'Ninja' generator - success 2025-05-07T19:51:15.8443622Z -------------------------------------------------------------------------------- 2025-05-07T19:51:15.8443910Z 2025-05-07T19:51:15.8457386Z Configuring Project 2025-05-07T19:51:15.8457661Z Working directory: 2025-05-07T19:51:15.8458043Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build 2025-05-07T19:51:15.8458464Z Command: 2025-05-07T19:51:15.8479658Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/cmake/data/bin/cmake /__w/FBGEMM/FBGEMM/fbgemm_gpu -G Ninja -DCMAKE_MAKE_PROGRAM:FILEPATH=/github/home/miniconda/envs/build_binary/bin/ninja --no-warn-unused-cli -DCMAKE_INSTALL_PREFIX:PATH=/__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install -DPYTHON_VERSION_STRING:STRING=3.13.2 -DSKBUILD:INTERNAL=TRUE -DCMAKE_MODULE_PATH:PATH=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/skbuild/resources/cmake -DPYTHON_EXECUTABLE:PATH=/github/home/miniconda/envs/build_binary/bin/python -DPYTHON_INCLUDE_DIR:PATH=/github/home/miniconda/envs/build_binary/include/python3.13 -DPYTHON_LIBRARY:PATH=/github/home/miniconda/envs/build_binary/lib/libpython3.13.so -DPython_EXECUTABLE:PATH=/github/home/miniconda/envs/build_binary/bin/python -DPython_ROOT_DIR:PATH=/github/home/miniconda/envs/build_binary -DPython_FIND_REGISTRY:STRING=NEVER -DPython_INCLUDE_DIR:PATH=/github/home/miniconda/envs/build_binary/include/python3.13 -DPython_NumPy_INCLUDE_DIRS:PATH=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/numpy/_core/include -DPython3_EXECUTABLE:PATH=/github/home/miniconda/envs/build_binary/bin/python -DPython3_ROOT_DIR:PATH=/github/home/miniconda/envs/build_binary -DPython3_FIND_REGISTRY:STRING=NEVER -DPython3_INCLUDE_DIR:PATH=/github/home/miniconda/envs/build_binary/include/python3.13 -DPython3_NumPy_INCLUDE_DIRS:PATH=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/numpy/_core/include -DCMAKE_MAKE_PROGRAM:FILEPATH=/github/home/miniconda/envs/build_binary/bin/ninja -DCMAKE_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ar -DCMAKE_CXX_COMPILER_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_C_COMPILER_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ranlib -DCMAKE_CXX_COMPILER_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_C_COMPILER_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_LINKER=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ld -DCMAKE_STRIP=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-strip -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch -D_GLIBCXX_USE_CXX11_ABI=1 -DCMAKE_VERBOSE_MAKEFILE=ON -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DFBGEMM_BUILD_TARGET=genai -DFBGEMM_BUILD_VARIANT=cuda -DNVML_LIB_PATH=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so -DNCCL_INCLUDE_DIRS=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -DNCCL_LIBRARIES=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2 -DCMAKE_C_COMPILER=/github/home/miniconda/envs/build_binary/bin/cc -DCMAKE_CXX_COMPILER=/github/home/miniconda/envs/build_binary/bin/c++ '-DCMAKE_C_FLAGS='"'"'-DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include'"'"'' '-DCMAKE_CXX_FLAGS='"'"'-DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include'"'"'' '-DTORCH_CUDA_ARCH_LIST=7.0;8.0;9.0;9.0a;10.0a;12.0a' -DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCMAKE_CXX_STANDARD=20 '-DTORCH_CUDA_ARCH_LIST=7.0;8.0;9.0;9.0a;10.0a;12.0a' -DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCMAKE_CXX_STANDARD=20 -DCMAKE_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ar -DCMAKE_CXX_COMPILER_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_C_COMPILER_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ranlib -DCMAKE_CXX_COMPILER_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_C_COMPILER_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_LINKER=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ld -DCMAKE_STRIP=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-strip -DCMAKE_BUILD_TYPE=Release 2025-05-07T19:51:15.8500230Z 2025-05-07T19:51:15.8861852Z 2025-05-07T19:51:15.8861863Z 2025-05-07T19:51:15.8862276Z ================================================================================ 2025-05-07T19:51:15.8862700Z Default C compiler flags 2025-05-07T19:51:15.8863057Z (values may be overridden by CMAKE_CXX_STANDARD and CXX_STANDARD): 2025-05-07T19:51:15.8863362Z 2025-05-07T19:51:15.8864241Z -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include 2025-05-07T19:51:15.8865337Z ================================================================================ 2025-05-07T19:51:15.8865589Z 2025-05-07T19:51:15.8865594Z 2025-05-07T19:51:15.8865598Z 2025-05-07T19:51:15.8865730Z ================================================================================ 2025-05-07T19:51:15.8866077Z Default C++ compiler flags 2025-05-07T19:51:15.8866431Z (values may be overridden by CMAKE_CXX_STANDARD and CXX_STANDARD): 2025-05-07T19:51:15.8866743Z 2025-05-07T19:51:15.8867585Z -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include 2025-05-07T19:51:15.8868650Z ================================================================================ 2025-05-07T19:51:15.8868878Z 2025-05-07T19:51:15.8868882Z 2025-05-07T19:51:15.8868886Z 2025-05-07T19:51:15.8868997Z ================================================================================ 2025-05-07T19:51:15.8869332Z AVX2_FLAGS: 2025-05-07T19:51:15.8869449Z 2025-05-07T19:51:15.8869527Z -mavx2 2025-05-07T19:51:15.8869726Z -mf16c 2025-05-07T19:51:15.8869907Z -mfma 2025-05-07T19:51:15.8870108Z -fopenmp 2025-05-07T19:51:15.8870345Z ================================================================================ 2025-05-07T19:51:15.8870572Z 2025-05-07T19:51:15.8870754Z Not searching for unused variables given on the command line. 2025-05-07T19:51:15.8871090Z 2025-05-07T19:51:15.8871094Z 2025-05-07T19:51:15.8871208Z ================================================================================ 2025-05-07T19:51:15.8871530Z AVX512_FLAGS: 2025-05-07T19:51:15.8871653Z 2025-05-07T19:51:15.8871733Z -mavx2 2025-05-07T19:51:15.8871931Z -mf16c 2025-05-07T19:51:15.8872113Z -mfma 2025-05-07T19:51:15.8872317Z -mavx512f 2025-05-07T19:51:15.8872512Z -mavx512bw 2025-05-07T19:51:15.8872724Z -mavx512dq 2025-05-07T19:51:15.8872916Z -mavx512vl 2025-05-07T19:51:15.8873392Z -fopenmp 2025-05-07T19:51:15.8873611Z ================================================================================ 2025-05-07T19:51:15.8873859Z 2025-05-07T19:51:15.8873862Z 2025-05-07T19:51:15.8873866Z 2025-05-07T19:51:15.8874110Z ================================================================================ 2025-05-07T19:51:15.8874461Z The project is built using scikit-build 2025-05-07T19:51:15.8874974Z ================================================================================ 2025-05-07T19:51:15.8875217Z 2025-05-07T19:51:15.8875221Z 2025-05-07T19:51:15.8875225Z 2025-05-07T19:51:15.8875418Z ================================================================================ 2025-05-07T19:51:15.8875729Z Build Settings 2025-05-07T19:51:15.8875870Z 2025-05-07T19:51:15.8875971Z FBGEMM_BUILD_TARGET : genai 2025-05-07T19:51:15.8876262Z FBGEMM_BUILD_VARIANT : cuda 2025-05-07T19:51:15.8876439Z 2025-05-07T19:51:15.8876533Z NVCC_VERBOSE : 2025-05-07T19:51:15.8876795Z CUDNN_INCLUDE_DIR : 2025-05-07T19:51:15.8877048Z CUDNN_LIBRARY : 2025-05-07T19:51:15.8877482Z NVML_LIB_PATH : /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:51:15.8877961Z TORCH_CUDA_ARCH_LIST : 7.0 2025-05-07T19:51:15.8878222Z 8.0 2025-05-07T19:51:15.8878402Z 9.0 2025-05-07T19:51:15.8878598Z 9.0a 2025-05-07T19:51:15.8878794Z 10.0a 2025-05-07T19:51:15.8878979Z 12.0a 2025-05-07T19:51:15.8879086Z 2025-05-07T19:51:15.8879193Z HIP_ROOT_DIR : 2025-05-07T19:51:15.8879437Z HIPCC_VERBOSE : 2025-05-07T19:51:15.8879697Z AMDGPU_TARGETS : 2025-05-07T19:51:15.8879943Z PYTORCH_ROCM_ARCH : 2025-05-07T19:51:15.8880226Z ================================================================================ 2025-05-07T19:51:15.8880453Z 2025-05-07T19:51:16.0293372Z -- The CXX compiler identification is Clang 16.0.6 2025-05-07T19:51:16.0980536Z -- The C compiler identification is Clang 16.0.6 2025-05-07T19:51:17.1531091Z -- The CUDA compiler identification is NVIDIA 12.8.61 with host compiler Clang 16.0.6 2025-05-07T19:51:17.1639939Z -- Detecting CXX compiler ABI info 2025-05-07T19:51:17.2918759Z -- Detecting CXX compiler ABI info - done 2025-05-07T19:51:17.3052534Z -- Check for working CXX compiler: /github/home/miniconda/envs/build_binary/bin/c++ - skipped 2025-05-07T19:51:17.3053079Z -- Detecting CXX compile features 2025-05-07T19:51:17.3060890Z -- Detecting CXX compile features - done 2025-05-07T19:51:17.3135213Z -- Detecting C compiler ABI info 2025-05-07T19:51:17.4330060Z -- Detecting C compiler ABI info - done 2025-05-07T19:51:17.4461775Z -- Check for working C compiler: /github/home/miniconda/envs/build_binary/bin/cc - skipped 2025-05-07T19:51:17.4464585Z -- Detecting C compile features 2025-05-07T19:51:17.4468873Z -- Detecting C compile features - done 2025-05-07T19:51:17.4518829Z -- Detecting CUDA compiler ABI info 2025-05-07T19:51:18.4596709Z -- Detecting CUDA compiler ABI info - done 2025-05-07T19:51:18.5137049Z -- Check for working CUDA compiler: /github/home/miniconda/envs/build_binary/bin/nvcc - skipped 2025-05-07T19:51:18.5159777Z -- Detecting CUDA compile features 2025-05-07T19:51:18.5161922Z -- Detecting CUDA compile features - done 2025-05-07T19:51:18.5182332Z -- Performing Test C_HAS_AVX_1 2025-05-07T19:51:18.8039780Z -- Performing Test C_HAS_AVX_1 - Failed 2025-05-07T19:51:18.8040188Z -- Performing Test C_HAS_AVX_2 2025-05-07T19:51:19.1282982Z -- Performing Test C_HAS_AVX_2 - Success 2025-05-07T19:51:19.1283398Z -- Performing Test C_HAS_AVX2_1 2025-05-07T19:51:19.4136107Z -- Performing Test C_HAS_AVX2_1 - Failed 2025-05-07T19:51:19.4136935Z -- Performing Test C_HAS_AVX2_2 2025-05-07T19:51:19.7393053Z -- Performing Test C_HAS_AVX2_2 - Success 2025-05-07T19:51:19.7393627Z -- Performing Test C_HAS_AVX512_1 2025-05-07T19:51:20.0248943Z -- Performing Test C_HAS_AVX512_1 - Failed 2025-05-07T19:51:20.0249368Z -- Performing Test C_HAS_AVX512_2 2025-05-07T19:51:20.3507844Z -- Performing Test C_HAS_AVX512_2 - Success 2025-05-07T19:51:20.3508541Z -- Performing Test CXX_HAS_AVX_1 2025-05-07T19:51:20.6359577Z -- Performing Test CXX_HAS_AVX_1 - Failed 2025-05-07T19:51:20.6360010Z -- Performing Test CXX_HAS_AVX_2 2025-05-07T19:51:20.9603150Z -- Performing Test CXX_HAS_AVX_2 - Success 2025-05-07T19:51:20.9603562Z -- Performing Test CXX_HAS_AVX2_1 2025-05-07T19:51:21.2453250Z -- Performing Test CXX_HAS_AVX2_1 - Failed 2025-05-07T19:51:21.2453722Z -- Performing Test CXX_HAS_AVX2_2 2025-05-07T19:51:21.5712116Z -- Performing Test CXX_HAS_AVX2_2 - Success 2025-05-07T19:51:21.5712522Z -- Performing Test CXX_HAS_AVX512_1 2025-05-07T19:51:21.8571828Z -- Performing Test CXX_HAS_AVX512_1 - Failed 2025-05-07T19:51:21.8572239Z -- Performing Test CXX_HAS_AVX512_2 2025-05-07T19:51:22.1873427Z -- Performing Test CXX_HAS_AVX512_2 - Success 2025-05-07T19:51:22.2051031Z -- Found CUDA: /github/home/miniconda/envs/build_binary/targets/x86_64-linux (found version "12.8") 2025-05-07T19:51:22.2086041Z -- Found CUDAToolkit: /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include (found version "12.8.61") 2025-05-07T19:51:22.2167374Z -- Performing Test CMAKE_HAVE_LIBC_PTHREAD 2025-05-07T19:51:22.3384414Z -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success 2025-05-07T19:51:22.3398534Z -- Found Threads: TRUE 2025-05-07T19:51:22.5040339Z -- PyTorch: CUDA detected: 12.8 2025-05-07T19:51:22.5040981Z -- PyTorch: CUDA nvcc is: /github/home/miniconda/envs/build_binary/targets/x86_64-linux/bin/nvcc 2025-05-07T19:51:22.5041740Z -- PyTorch: CUDA toolkit directory: /github/home/miniconda/envs/build_binary/targets/x86_64-linux 2025-05-07T19:51:22.6538364Z -- PyTorch: Header version is: 12.8 2025-05-07T19:51:22.7538057Z -- Found Python: /github/home/miniconda/envs/build_binary/bin/python (found version "3.13.2") found components: Interpreter 2025-05-07T19:51:22.7565438Z CMake Warning at /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:140 (message): 2025-05-07T19:51:22.7567953Z -- USE_CUDNN is set to 0. Compiling without cuDNN support 2025-05-07T19:51:22.7569341Z -- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support 2025-05-07T19:51:22.7570654Z -- USE_CUDSS is set to 0. Compiling without cuDSS support 2025-05-07T19:51:22.7571777Z -- USE_CUFILE is set to 0. Compiling without cuFile support 2025-05-07T19:51:22.7572208Z Failed to compute shorthash for libnvrtc.so 2025-05-07T19:51:22.7572558Z Call Stack (most recent call first): 2025-05-07T19:51:22.7573252Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:86 (include) 2025-05-07T19:51:22.7574386Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) 2025-05-07T19:51:22.7575649Z /__w/FBGEMM/FBGEMM/cmake/modules/PyTorchSetup.cmake:14 (find_package) 2025-05-07T19:51:22.7576100Z CMakeLists.txt:112 (include) 2025-05-07T19:51:22.7576284Z 2025-05-07T19:51:22.7576305Z 2025-05-07T19:51:22.7577149Z -- Added CUDA NVCC flags for: -gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_90a,code=sm_90a;-gencode;arch=compute_100a,code=sm_100a;-gencode;arch=compute_120a,code=sm_120a 2025-05-07T19:51:22.7940202Z CMake Warning at /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message): 2025-05-07T19:51:22.7942340Z static library kineto_LIBRARY-NOTFOUND not found. 2025-05-07T19:51:22.7942711Z Call Stack (most recent call first): 2025-05-07T19:51:22.7943578Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:125 (append_torchlib_if_found) 2025-05-07T19:51:22.7944447Z /__w/FBGEMM/FBGEMM/cmake/modules/PyTorchSetup.cmake:14 (find_package) 2025-05-07T19:51:22.7944875Z CMakeLists.txt:112 (include) 2025-05-07T19:51:22.7945046Z 2025-05-07T19:51:22.7945051Z 2025-05-07T19:51:22.7945993Z -- Found Torch: /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch.so 2025-05-07T19:51:22.7948882Z 2025-05-07T19:51:22.7948887Z 2025-05-07T19:51:22.7949025Z ================================================================================ 2025-05-07T19:51:22.7949364Z PyTorch Flags: 2025-05-07T19:51:22.7949590Z 2025-05-07T19:51:22.7949944Z TORCH_INCLUDE_DIRS: 2025-05-07T19:51:22.7950371Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include 2025-05-07T19:51:22.7951172Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include 2025-05-07T19:51:22.7951759Z 2025-05-07T19:51:22.7951971Z TORCH_LIBRARIES: 2025-05-07T19:51:22.7952189Z torch 2025-05-07T19:51:22.7952417Z torch_library 2025-05-07T19:51:22.7952858Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10.so 2025-05-07T19:51:22.7953554Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so 2025-05-07T19:51:22.7954246Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10_cuda.so 2025-05-07T19:51:22.7954794Z 2025-05-07T19:51:22.7955004Z TORCH_CUDA_OPTIONS: 2025-05-07T19:51:22.7955245Z --expt-relaxed-constexpr 2025-05-07T19:51:22.7955528Z -D__CUDA_NO_HALF_OPERATORS__ 2025-05-07T19:51:22.7955814Z -D__CUDA_NO_BFLOAT16_CONVERSIONS__ 2025-05-07T19:51:22.7956124Z -D__CUDA_NO_HALF2_OPERATORS__ 2025-05-07T19:51:22.7956415Z ================================================================================ 2025-05-07T19:51:22.7956665Z 2025-05-07T19:51:22.7956687Z 2025-05-07T19:51:22.7956691Z 2025-05-07T19:51:22.7956809Z ================================================================================ 2025-05-07T19:51:22.7957136Z NCCL Flags 2025-05-07T19:51:22.7957256Z 2025-05-07T19:51:22.7957635Z NCCL_INCLUDE_DIRS=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include 2025-05-07T19:51:22.7958553Z NCCL_LIBRARIES=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2 2025-05-07T19:51:22.7959306Z ================================================================================ 2025-05-07T19:51:22.7959544Z 2025-05-07T19:51:22.7959548Z 2025-05-07T19:51:22.7959552Z 2025-05-07T19:51:22.7959668Z ================================================================================ 2025-05-07T19:51:22.7959996Z CUDA Driver Path 2025-05-07T19:51:22.7960133Z 2025-05-07T19:51:22.7960489Z CUDA_DRIVER_LIBRARIES=/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:51:22.7961076Z ================================================================================ 2025-05-07T19:51:22.7961298Z 2025-05-07T19:51:22.7961601Z -- Found NVML_LIB_PATH: /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:51:22.7980782Z 2025-05-07T19:51:22.7981307Z 2025-05-07T19:51:22.7981857Z ================================================================================ 2025-05-07T19:51:22.7983043Z GPU CPP Library Target: asmjit (SHARED) 2025-05-07T19:51:22.7983898Z 2025-05-07T19:51:22.7984427Z CPU_SRCS: 2025-05-07T19:51:22.7984754Z 2025-05-07T19:51:22.7984963Z 2025-05-07T19:51:22.7985477Z GPU_SRCS: 2025-05-07T19:51:22.7985790Z 2025-05-07T19:51:22.7985994Z 2025-05-07T19:51:22.7986524Z CUDA_SPECIFIC_SRCS: 2025-05-07T19:51:22.7986932Z 2025-05-07T19:51:22.7987164Z 2025-05-07T19:51:22.7987688Z HIP_SPECIFIC_SRCS: 2025-05-07T19:51:22.7988073Z 2025-05-07T19:51:22.7988290Z 2025-05-07T19:51:22.7988775Z OTHER_SRCS: 2025-05-07T19:51:22.7989862Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64assembler.cpp 2025-05-07T19:51:22.7991150Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64builder.cpp 2025-05-07T19:51:22.7991756Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64compiler.cpp 2025-05-07T19:51:22.7992361Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64emithelper.cpp 2025-05-07T19:51:22.7992983Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64formatter.cpp 2025-05-07T19:51:22.7993769Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64func.cpp 2025-05-07T19:51:22.7994356Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64instapi.cpp 2025-05-07T19:51:22.7995053Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64instdb.cpp 2025-05-07T19:51:22.7995685Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64operand.cpp 2025-05-07T19:51:22.7996265Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64rapass.cpp 2025-05-07T19:51:22.7996866Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/armformatter.cpp 2025-05-07T19:51:22.7997464Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/archtraits.cpp 2025-05-07T19:51:22.7998077Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/assembler.cpp 2025-05-07T19:51:22.7998649Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/builder.cpp 2025-05-07T19:51:22.7999244Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/codeholder.cpp 2025-05-07T19:51:22.7999849Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/codewriter.cpp 2025-05-07T19:51:22.8000442Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/compiler.cpp 2025-05-07T19:51:22.8001043Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/constpool.cpp 2025-05-07T19:51:22.8001619Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/cpuinfo.cpp 2025-05-07T19:51:22.8002208Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/emithelper.cpp 2025-05-07T19:51:22.8002786Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/emitter.cpp 2025-05-07T19:51:22.8003385Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/emitterutils.cpp 2025-05-07T19:51:22.8004002Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/environment.cpp 2025-05-07T19:51:22.8004602Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/errorhandler.cpp 2025-05-07T19:51:22.8005215Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/formatter.cpp 2025-05-07T19:51:22.8005777Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/func.cpp 2025-05-07T19:51:22.8006378Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/funcargscontext.cpp 2025-05-07T19:51:22.8006988Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/globals.cpp 2025-05-07T19:51:22.8007541Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/inst.cpp 2025-05-07T19:51:22.8008107Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/instdb.cpp 2025-05-07T19:51:22.8008692Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/jitallocator.cpp 2025-05-07T19:51:22.8009305Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/jitruntime.cpp 2025-05-07T19:51:22.8009885Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/logger.cpp 2025-05-07T19:51:22.8010461Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/operand.cpp 2025-05-07T19:51:22.8011039Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/osutils.cpp 2025-05-07T19:51:22.8011708Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/ralocal.cpp 2025-05-07T19:51:22.8012274Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/rapass.cpp 2025-05-07T19:51:22.8012814Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/rastack.cpp 2025-05-07T19:51:22.8013372Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/string.cpp 2025-05-07T19:51:22.8013915Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/support.cpp 2025-05-07T19:51:22.8014472Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/target.cpp 2025-05-07T19:51:22.8015020Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/type.cpp 2025-05-07T19:51:22.8015642Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/virtmem.cpp 2025-05-07T19:51:22.8016461Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zone.cpp 2025-05-07T19:51:22.8017022Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zonehash.cpp 2025-05-07T19:51:22.8017675Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zonelist.cpp 2025-05-07T19:51:22.8018269Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zonestack.cpp 2025-05-07T19:51:22.8018848Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zonetree.cpp 2025-05-07T19:51:22.8019443Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zonevector.cpp 2025-05-07T19:51:22.8020039Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86assembler.cpp 2025-05-07T19:51:22.8020639Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86builder.cpp 2025-05-07T19:51:22.8021225Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86compiler.cpp 2025-05-07T19:51:22.8021850Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86emithelper.cpp 2025-05-07T19:51:22.8022562Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86formatter.cpp 2025-05-07T19:51:22.8023123Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86func.cpp 2025-05-07T19:51:22.8023691Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86instapi.cpp 2025-05-07T19:51:22.8024249Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86instdb.cpp 2025-05-07T19:51:22.8024818Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86operand.cpp 2025-05-07T19:51:22.8025375Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86rapass.cpp 2025-05-07T19:51:22.8025799Z 2025-05-07T19:51:22.8025992Z CC_FLAGS: 2025-05-07T19:51:22.8026103Z 2025-05-07T19:51:22.8026180Z 2025-05-07T19:51:22.8026370Z NVCC_FLAGS: 2025-05-07T19:51:22.8026483Z 2025-05-07T19:51:22.8026560Z 2025-05-07T19:51:22.8026751Z HIPCC_FLAGS: 2025-05-07T19:51:22.8026875Z 2025-05-07T19:51:22.8026954Z 2025-05-07T19:51:22.8027146Z INCLUDE_DIRS: 2025-05-07T19:51:22.8027368Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:51:22.8027685Z /__w/FBGEMM/FBGEMM/fbgemm_gpu 2025-05-07T19:51:22.8027958Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include 2025-05-07T19:51:22.8028271Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:51:22.8028765Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include 2025-05-07T19:51:22.8029547Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include 2025-05-07T19:51:22.8030194Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src 2025-05-07T19:51:22.8030595Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include 2025-05-07T19:51:22.8031027Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include 2025-05-07T19:51:22.8031483Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include 2025-05-07T19:51:22.8032002Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include 2025-05-07T19:51:22.8032466Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include 2025-05-07T19:51:22.8033008Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include 2025-05-07T19:51:22.8033516Z 2025-05-07T19:51:22.8033707Z Selected Source Files: 2025-05-07T19:51:22.8033876Z 2025-05-07T19:51:22.8033958Z 2025-05-07T19:51:22.8034145Z HIPified Source Files: 2025-05-07T19:51:22.8034314Z 2025-05-07T19:51:22.8034392Z 2025-05-07T19:51:22.8034589Z Library Dependencies: 2025-05-07T19:51:22.8034827Z torch 2025-05-07T19:51:22.8035032Z torch_library 2025-05-07T19:51:22.8035453Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10.so 2025-05-07T19:51:22.8036123Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so 2025-05-07T19:51:22.8036787Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10_cuda.so 2025-05-07T19:51:22.8037565Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2 2025-05-07T19:51:22.8038357Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:51:22.8038947Z /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:51:22.8039344Z 2025-05-07T19:51:22.8039588Z Output Library: 2025-05-07T19:51:22.8039801Z asmjit 2025-05-07T19:51:22.8039979Z 2025-05-07T19:51:22.8040192Z Destination Directory: 2025-05-07T19:51:22.8040417Z fbgemm_gpu 2025-05-07T19:51:22.8040650Z ================================================================================ 2025-05-07T19:51:22.8040873Z 2025-05-07T19:51:22.8040901Z 2025-05-07T19:51:22.8040919Z 2025-05-07T19:51:22.8041027Z ================================================================================ 2025-05-07T19:51:22.8041352Z GPU CPP Library Target: fbgemm (SHARED) 2025-05-07T19:51:22.8041648Z 2025-05-07T19:51:22.8041827Z CPU_SRCS: 2025-05-07T19:51:22.8041952Z 2025-05-07T19:51:22.8042026Z 2025-05-07T19:51:22.8042219Z GPU_SRCS: 2025-05-07T19:51:22.8042328Z 2025-05-07T19:51:22.8042403Z 2025-05-07T19:51:22.8042596Z CUDA_SPECIFIC_SRCS: 2025-05-07T19:51:22.8042733Z 2025-05-07T19:51:22.8042808Z 2025-05-07T19:51:22.8043008Z HIP_SPECIFIC_SRCS: 2025-05-07T19:51:22.8043141Z 2025-05-07T19:51:22.8043216Z 2025-05-07T19:51:22.8043409Z OTHER_SRCS: 2025-05-07T19:51:22.8043666Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDM.cc 2025-05-07T19:51:22.8044104Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDMAutovec.cc 2025-05-07T19:51:22.8044560Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDMNBit.cc 2025-05-07T19:51:22.8044953Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/QuantUtils.cc 2025-05-07T19:51:22.8045357Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/RefImplementations.cc 2025-05-07T19:51:22.8045815Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/RowWiseSparseAdagradFused.cc 2025-05-07T19:51:22.8046267Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/SparseAdagrad.cc 2025-05-07T19:51:22.8046620Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/Utils.cc 2025-05-07T19:51:22.8047012Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDMAvx2.cc 2025-05-07T19:51:22.8047421Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/QuantUtilsAvx2.cc 2025-05-07T19:51:22.8047833Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDMAvx2.cc 2025-05-07T19:51:22.8048249Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/QuantUtilsAvx2.cc 2025-05-07T19:51:22.8048659Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDMAvx512.cc 2025-05-07T19:51:22.8049020Z 2025-05-07T19:51:22.8049196Z CC_FLAGS: 2025-05-07T19:51:22.8049322Z 2025-05-07T19:51:22.8049398Z 2025-05-07T19:51:22.8049572Z NVCC_FLAGS: 2025-05-07T19:51:22.8049700Z 2025-05-07T19:51:22.8049774Z 2025-05-07T19:51:22.8049953Z HIPCC_FLAGS: 2025-05-07T19:51:22.8050085Z 2025-05-07T19:51:22.8050160Z 2025-05-07T19:51:22.8050350Z INCLUDE_DIRS: 2025-05-07T19:51:22.8050573Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:51:22.8050886Z /__w/FBGEMM/FBGEMM/fbgemm_gpu 2025-05-07T19:51:22.8051155Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include 2025-05-07T19:51:22.8051466Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:51:22.8051933Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include 2025-05-07T19:51:22.8052705Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include 2025-05-07T19:51:22.8053331Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src 2025-05-07T19:51:22.8053736Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include 2025-05-07T19:51:22.8054156Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include 2025-05-07T19:51:22.8054619Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include 2025-05-07T19:51:22.8055128Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include 2025-05-07T19:51:22.8055648Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include 2025-05-07T19:51:22.8056373Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include 2025-05-07T19:51:22.8056871Z 2025-05-07T19:51:22.8057203Z Selected Source Files: 2025-05-07T19:51:22.8057354Z 2025-05-07T19:51:22.8057449Z 2025-05-07T19:51:22.8057641Z HIPified Source Files: 2025-05-07T19:51:22.8057792Z 2025-05-07T19:51:22.8057885Z 2025-05-07T19:51:22.8058081Z Library Dependencies: 2025-05-07T19:51:22.8058326Z torch 2025-05-07T19:51:22.8058579Z torch_library 2025-05-07T19:51:22.8059022Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10.so 2025-05-07T19:51:22.8059691Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so 2025-05-07T19:51:22.8060393Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10_cuda.so 2025-05-07T19:51:22.8061198Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2 2025-05-07T19:51:22.8062033Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:51:22.8062471Z asmjit 2025-05-07T19:51:22.8062766Z /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:51:22.8063146Z 2025-05-07T19:51:22.8063333Z Output Library: 2025-05-07T19:51:22.8063542Z fbgemm 2025-05-07T19:51:22.8063712Z 2025-05-07T19:51:22.8063911Z Destination Directory: 2025-05-07T19:51:22.8064130Z fbgemm_gpu 2025-05-07T19:51:22.8064359Z ================================================================================ 2025-05-07T19:51:22.8064570Z 2025-05-07T19:51:22.8064574Z 2025-05-07T19:51:22.8064577Z 2025-05-07T19:51:22.8064696Z ================================================================================ 2025-05-07T19:51:22.8064999Z Running code generation script ... 2025-05-07T19:51:22.8065699Z /github/home/miniconda/envs/build_binary/bin/python /__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_backward_split.py --opensource 2025-05-07T19:51:22.8066407Z ================================================================================ 2025-05-07T19:51:22.8066636Z 2025-05-07T19:51:23.3500895Z [ARGS PARSE] Parsed arguments: Namespace(install_dir='.', is_fbcode=False, is_rocm=False) 2025-05-07T19:51:23.3502071Z [GENERAATE BACKWARD SPLIT]: ['/__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_backward_split.py', '--opensource'] 2025-05-07T19:51:23.3502892Z Written: gen_embedding_backward_dense_split_weighted_vbe_cuda.cu 2025-05-07T19:51:23.3503556Z Written: gen_embedding_backward_dense_split_weighted_cuda.cu 2025-05-07T19:51:23.3504075Z Written: gen_embedding_backward_dense_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:23.3504593Z Written: gen_embedding_backward_dense_split_unweighted_vbe_cuda.cu 2025-05-07T19:51:23.3505101Z Written: gen_embedding_backward_dense_split_unweighted_cuda.cu 2025-05-07T19:51:23.3505703Z Written: gen_embedding_backward_dense_split_weighted_vbe_meta.cpp 2025-05-07T19:51:23.3506300Z Written: gen_embedding_backward_dense_split_weighted_meta.cpp 2025-05-07T19:51:23.3506768Z Written: gen_embedding_backward_dense_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:23.3507268Z Written: gen_embedding_backward_dense_split_unweighted_vbe_meta.cpp 2025-05-07T19:51:23.3507751Z Written: gen_embedding_backward_dense_split_unweighted_meta.cpp 2025-05-07T19:51:23.3508219Z Written: gen_embedding_backward_dense_split_weighted_vbe_kernel_cta.cu 2025-05-07T19:51:23.3508720Z Written: gen_embedding_backward_dense_split_weighted_kernel_cta.cu 2025-05-07T19:51:23.3509221Z Written: gen_embedding_backward_dense_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:23.3509766Z Written: gen_embedding_backward_dense_split_unweighted_vbe_kernel_cta.cu 2025-05-07T19:51:23.3510287Z Written: gen_embedding_backward_dense_split_unweighted_kernel_cta.cu 2025-05-07T19:51:23.3510782Z Written: gen_embedding_backward_dense_split_weighted_vbe_kernel_warp.cu 2025-05-07T19:51:23.3511295Z Written: gen_embedding_backward_dense_split_weighted_kernel_warp.cu 2025-05-07T19:51:23.3511797Z Written: gen_embedding_backward_dense_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:23.3512345Z Written: gen_embedding_backward_dense_split_unweighted_vbe_kernel_warp.cu 2025-05-07T19:51:23.3513151Z Written: gen_embedding_backward_dense_split_unweighted_kernel_warp.cu 2025-05-07T19:51:23.3513638Z Written: gen_embedding_optimizer_dense_split_device_kernel.cuh 2025-05-07T19:51:23.3514055Z Written: gen_embedding_backward_split_dense.cpp 2025-05-07T19:51:23.3515293Z Written: gen_embedding_backward_dense_split_cpu.cpp 2025-05-07T19:51:23.3515718Z Written: gen_embedding_backward_adagrad_split_weighted_cuda.cu 2025-05-07T19:51:23.3516185Z Written: gen_embedding_backward_adagrad_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:23.3516677Z Written: gen_embedding_backward_adagrad_split_unweighted_cuda.cu 2025-05-07T19:51:23.3517122Z Written: gen_embedding_backward_adagrad_split_weighted_meta.cpp 2025-05-07T19:51:23.3517614Z Written: gen_embedding_backward_adagrad_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:23.3518099Z Written: gen_embedding_backward_adagrad_split_unweighted_meta.cpp 2025-05-07T19:51:23.3518583Z Written: gen_embedding_backward_adagrad_split_weighted_kernel_cta.cu 2025-05-07T19:51:23.3519111Z Written: gen_embedding_backward_adagrad_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:23.3519631Z Written: gen_embedding_backward_adagrad_split_unweighted_kernel_cta.cu 2025-05-07T19:51:23.3520137Z Written: gen_embedding_backward_adagrad_split_weighted_kernel_warp.cu 2025-05-07T19:51:23.3520657Z Written: gen_embedding_backward_adagrad_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:23.3521199Z Written: gen_embedding_backward_adagrad_split_unweighted_kernel_warp.cu 2025-05-07T19:51:23.3521692Z Written: gen_embedding_optimizer_adagrad_split_device_kernel.cuh 2025-05-07T19:51:23.3522097Z Written: gen_embedding_backward_split_adagrad.cpp 2025-05-07T19:51:23.3522487Z Written: gen_embedding_split_adagrad_pt2_autograd.cpp 2025-05-07T19:51:23.3522911Z Written: gen_embedding_backward_split_adagrad_pt2_cuda_wrapper.cpp 2025-05-07T19:51:23.3523310Z Written: lookup_adagrad.py 2025-05-07T19:51:23.3523612Z Written: gen_embedding_backward_adagrad_split_cpu.cpp 2025-05-07T19:51:23.3524007Z Written: gen_embedding_backward_split_adagrad_cpu.cpp 2025-05-07T19:51:23.3524427Z Written: gen_embedding_backward_split_adagrad_pt2_cpu_wrapper.cpp 2025-05-07T19:51:23.3524897Z Written: gen_embedding_backward_adam_split_weighted_vbe_cuda.cu 2025-05-07T19:51:23.3525347Z Written: gen_embedding_backward_adam_split_weighted_cuda.cu 2025-05-07T19:51:23.3525790Z Written: gen_embedding_backward_adam_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:23.3526272Z Written: gen_embedding_backward_adam_split_unweighted_vbe_cuda.cu 2025-05-07T19:51:23.3526713Z Written: gen_embedding_backward_adam_split_unweighted_cuda.cu 2025-05-07T19:51:23.3527171Z Written: gen_embedding_backward_adam_split_weighted_vbe_meta.cpp 2025-05-07T19:51:23.3527606Z Written: gen_embedding_backward_adam_split_weighted_meta.cpp 2025-05-07T19:51:23.3528074Z Written: gen_embedding_backward_adam_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:23.3528567Z Written: gen_embedding_backward_adam_split_unweighted_vbe_meta.cpp 2025-05-07T19:51:23.3529023Z Written: gen_embedding_backward_adam_split_unweighted_meta.cpp 2025-05-07T19:51:23.3529500Z Written: gen_embedding_backward_adam_split_weighted_vbe_kernel_cta.cu 2025-05-07T19:51:23.3529972Z Written: gen_embedding_backward_adam_split_weighted_kernel_cta.cu 2025-05-07T19:51:23.3530479Z Written: gen_embedding_backward_adam_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:23.3530994Z Written: gen_embedding_backward_adam_split_unweighted_vbe_kernel_cta.cu 2025-05-07T19:51:23.3531499Z Written: gen_embedding_backward_adam_split_unweighted_kernel_cta.cu 2025-05-07T19:51:23.3531997Z Written: gen_embedding_backward_adam_split_weighted_vbe_kernel_warp.cu 2025-05-07T19:51:23.3532477Z Written: gen_embedding_backward_adam_split_weighted_kernel_warp.cu 2025-05-07T19:51:23.3532986Z Written: gen_embedding_backward_adam_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:23.3533503Z Written: gen_embedding_backward_adam_split_unweighted_vbe_kernel_warp.cu 2025-05-07T19:51:23.3534019Z Written: gen_embedding_backward_adam_split_unweighted_kernel_warp.cu 2025-05-07T19:51:23.3534569Z Written: gen_embedding_optimizer_adam_split_device_kernel.cuh 2025-05-07T19:51:23.3534979Z Written: gen_embedding_backward_split_adam.cpp 2025-05-07T19:51:23.3535350Z Written: gen_embedding_split_adam_pt2_autograd.cpp 2025-05-07T19:51:23.3536188Z Written: gen_embedding_backward_split_adam_pt2_cuda_wrapper.cpp 2025-05-07T19:51:23.3536657Z Written: lookup_adam.py 2025-05-07T19:51:23.3536957Z Written: gen_embedding_backward_split_adam_cpu.cpp 2025-05-07T19:51:23.3537408Z Written: gen_embedding_backward_split_adam_pt2_cpu_wrapper.cpp 2025-05-07T19:51:23.3537875Z Written: gen_embedding_backward_lamb_split_weighted_cuda.cu 2025-05-07T19:51:23.3538370Z Written: gen_embedding_backward_lamb_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:23.3538861Z Written: gen_embedding_backward_lamb_split_unweighted_cuda.cu 2025-05-07T19:51:23.3550265Z Written: gen_embedding_backward_lamb_split_weighted_meta.cpp 2025-05-07T19:51:23.3550872Z Written: gen_embedding_backward_lamb_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:23.3551390Z Written: gen_embedding_backward_lamb_split_unweighted_meta.cpp 2025-05-07T19:51:23.3551847Z Written: gen_embedding_backward_lamb_split_weighted_kernel_cta.cu 2025-05-07T19:51:23.3552370Z Written: gen_embedding_backward_lamb_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:23.3552895Z Written: gen_embedding_backward_lamb_split_unweighted_kernel_cta.cu 2025-05-07T19:51:23.3553375Z Written: gen_embedding_backward_lamb_split_weighted_kernel_warp.cu 2025-05-07T19:51:23.3553892Z Written: gen_embedding_backward_lamb_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:23.3554401Z Written: gen_embedding_backward_lamb_split_unweighted_kernel_warp.cu 2025-05-07T19:51:23.3554885Z Written: gen_embedding_optimizer_lamb_split_device_kernel.cuh 2025-05-07T19:51:23.3555279Z Written: gen_embedding_backward_split_lamb.cpp 2025-05-07T19:51:23.3555650Z Written: gen_embedding_split_lamb_pt2_autograd.cpp 2025-05-07T19:51:23.3556077Z Written: gen_embedding_backward_split_lamb_pt2_cuda_wrapper.cpp 2025-05-07T19:51:23.3556454Z Written: lookup_lamb.py 2025-05-07T19:51:23.3556756Z Written: gen_embedding_backward_split_lamb_cpu.cpp 2025-05-07T19:51:23.3557159Z Written: gen_embedding_backward_split_lamb_pt2_cpu_wrapper.cpp 2025-05-07T19:51:23.3557631Z Written: gen_embedding_backward_lars_sgd_split_weighted_cuda.cu 2025-05-07T19:51:23.3558107Z Written: gen_embedding_backward_lars_sgd_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:23.3558616Z Written: gen_embedding_backward_lars_sgd_split_unweighted_cuda.cu 2025-05-07T19:51:23.3559078Z Written: gen_embedding_backward_lars_sgd_split_weighted_meta.cpp 2025-05-07T19:51:23.3559586Z Written: gen_embedding_backward_lars_sgd_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:23.3560096Z Written: gen_embedding_backward_lars_sgd_split_unweighted_meta.cpp 2025-05-07T19:51:23.3560572Z Written: gen_embedding_backward_lars_sgd_split_weighted_kernel_cta.cu 2025-05-07T19:51:23.3561103Z Written: gen_embedding_backward_lars_sgd_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:23.3561636Z Written: gen_embedding_backward_lars_sgd_split_unweighted_kernel_cta.cu 2025-05-07T19:51:23.3562149Z Written: gen_embedding_backward_lars_sgd_split_weighted_kernel_warp.cu 2025-05-07T19:51:23.3562693Z Written: gen_embedding_backward_lars_sgd_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:23.3563225Z Written: gen_embedding_backward_lars_sgd_split_unweighted_kernel_warp.cu 2025-05-07T19:51:23.3563726Z Written: gen_embedding_optimizer_lars_sgd_split_device_kernel.cuh 2025-05-07T19:51:23.3564136Z Written: gen_embedding_backward_split_lars_sgd.cpp 2025-05-07T19:51:23.3564524Z Written: gen_embedding_split_lars_sgd_pt2_autograd.cpp 2025-05-07T19:51:23.3564950Z Written: gen_embedding_backward_split_lars_sgd_pt2_cuda_wrapper.cpp 2025-05-07T19:51:23.3565350Z Written: lookup_lars_sgd.py 2025-05-07T19:51:23.3565658Z Written: gen_embedding_backward_split_lars_sgd_cpu.cpp 2025-05-07T19:51:23.3566101Z Written: gen_embedding_backward_split_lars_sgd_pt2_cpu_wrapper.cpp 2025-05-07T19:51:23.3566797Z Written: gen_embedding_backward_partial_rowwise_adam_split_weighted_cuda.cu 2025-05-07T19:51:23.3567359Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:23.3568042Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_cuda.cu 2025-05-07T19:51:23.3568593Z Written: gen_embedding_backward_partial_rowwise_adam_split_weighted_meta.cpp 2025-05-07T19:51:23.3569180Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:23.3569778Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_meta.cpp 2025-05-07T19:51:23.3570347Z Written: gen_embedding_backward_partial_rowwise_adam_split_weighted_kernel_cta.cu 2025-05-07T19:51:23.3570974Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:23.3571596Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_kernel_cta.cu 2025-05-07T19:51:23.3572208Z Written: gen_embedding_backward_partial_rowwise_adam_split_weighted_kernel_warp.cu 2025-05-07T19:51:23.3572824Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:23.3573464Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_kernel_warp.cu 2025-05-07T19:51:23.4395519Z Written: gen_embedding_optimizer_partial_rowwise_adam_split_device_kernel.cuh 2025-05-07T19:51:23.4397262Z Written: gen_embedding_backward_split_partial_rowwise_adam.cpp 2025-05-07T19:51:23.4398688Z Written: gen_embedding_split_partial_rowwise_adam_pt2_autograd.cpp 2025-05-07T19:51:23.4400333Z Written: gen_embedding_backward_split_partial_rowwise_adam_pt2_cuda_wrapper.cpp 2025-05-07T19:51:23.4401717Z Written: lookup_partial_rowwise_adam.py 2025-05-07T19:51:23.4402870Z Written: gen_embedding_backward_split_partial_rowwise_adam_cpu.cpp 2025-05-07T19:51:23.4403390Z Written: gen_embedding_backward_split_partial_rowwise_adam_pt2_cpu_wrapper.cpp 2025-05-07T19:51:23.4403980Z Written: gen_embedding_backward_partial_rowwise_lamb_split_weighted_cuda.cu 2025-05-07T19:51:23.4404562Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:23.4405135Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_cuda.cu 2025-05-07T19:51:23.4405710Z Written: gen_embedding_backward_partial_rowwise_lamb_split_weighted_meta.cpp 2025-05-07T19:51:23.4406283Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:23.4406890Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_meta.cpp 2025-05-07T19:51:23.4407458Z Written: gen_embedding_backward_partial_rowwise_lamb_split_weighted_kernel_cta.cu 2025-05-07T19:51:23.4408095Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:23.4408734Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_kernel_cta.cu 2025-05-07T19:51:23.4409324Z Written: gen_embedding_backward_partial_rowwise_lamb_split_weighted_kernel_warp.cu 2025-05-07T19:51:23.4409969Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:23.4410594Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_kernel_warp.cu 2025-05-07T19:51:23.4411199Z Written: gen_embedding_optimizer_partial_rowwise_lamb_split_device_kernel.cuh 2025-05-07T19:51:23.4411724Z Written: gen_embedding_backward_split_partial_rowwise_lamb.cpp 2025-05-07T19:51:23.4412185Z Written: gen_embedding_split_partial_rowwise_lamb_pt2_autograd.cpp 2025-05-07T19:51:23.4412728Z Written: gen_embedding_backward_split_partial_rowwise_lamb_pt2_cuda_wrapper.cpp 2025-05-07T19:51:23.4413171Z Written: lookup_partial_rowwise_lamb.py 2025-05-07T19:51:23.4413579Z Written: gen_embedding_backward_split_partial_rowwise_lamb_cpu.cpp 2025-05-07T19:51:23.4414092Z Written: gen_embedding_backward_split_partial_rowwise_lamb_pt2_cpu_wrapper.cpp 2025-05-07T19:51:23.4414644Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_vbe_cuda.cu 2025-05-07T19:51:23.4415571Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_cuda.cu 2025-05-07T19:51:23.4416284Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_cuda.cu 2025-05-07T19:51:23.4416985Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_cuda.cu 2025-05-07T19:51:23.4417547Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_nobag_cuda.cu 2025-05-07T19:51:23.4418153Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:23.4418736Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_vbe_cuda.cu 2025-05-07T19:51:23.4419329Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_cuda.cu 2025-05-07T19:51:23.4419903Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_cuda.cu 2025-05-07T19:51:23.4420448Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_cuda.cu 2025-05-07T19:51:23.4421025Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_vbe_meta.cpp 2025-05-07T19:51:23.4421598Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_meta.cpp 2025-05-07T19:51:23.4422271Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_meta.cpp 2025-05-07T19:51:23.4422774Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_meta.cpp 2025-05-07T19:51:23.4423320Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_nobag_meta.cpp 2025-05-07T19:51:23.4423892Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:23.4424438Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_vbe_meta.cpp 2025-05-07T19:51:23.4424996Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_meta.cpp 2025-05-07T19:51:23.4425522Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_meta.cpp 2025-05-07T19:51:23.4426054Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_meta.cpp 2025-05-07T19:51:23.4426582Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_vbe_kernel_cta.cu 2025-05-07T19:51:23.4427160Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_kernel_cta.cu 2025-05-07T19:51:23.4427723Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_kernel_cta.cu 2025-05-07T19:51:23.4428259Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_kernel_cta.cu 2025-05-07T19:51:23.4428834Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:23.4429430Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:23.4430035Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_vbe_kernel_cta.cu 2025-05-07T19:51:23.4430629Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_kernel_cta.cu 2025-05-07T19:51:23.4431191Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_kernel_cta.cu 2025-05-07T19:51:23.4431759Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_kernel_cta.cu 2025-05-07T19:51:23.4432322Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_vbe_kernel_warp.cu 2025-05-07T19:51:23.4432913Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_kernel_warp.cu 2025-05-07T19:51:23.4433470Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_kernel_warp.cu 2025-05-07T19:51:23.4434032Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_kernel_warp.cu 2025-05-07T19:51:23.4434613Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:23.4435210Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:23.4435826Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_vbe_kernel_warp.cu 2025-05-07T19:51:23.4436407Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_kernel_warp.cu 2025-05-07T19:51:23.4436991Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_kernel_warp.cu 2025-05-07T19:51:23.4437644Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_kernel_warp.cu 2025-05-07T19:51:23.4438225Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_gwd_kernel_cta.cu 2025-05-07T19:51:23.4438901Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_gwd_kernel_cta.cu 2025-05-07T19:51:23.4439500Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_gwd_kernel_cta.cu 2025-05-07T19:51:23.4440124Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_gwd_kernel_cta.cu 2025-05-07T19:51:23.4440720Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_gwd_kernel_warp.cu 2025-05-07T19:51:23.4441330Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_gwd_kernel_warp.cu 2025-05-07T19:51:23.4441944Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_gwd_kernel_warp.cu 2025-05-07T19:51:23.4442549Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_gwd_kernel_warp.cu 2025-05-07T19:51:23.4443145Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_gwd_cuda.cu 2025-05-07T19:51:23.4443696Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_gwd_cuda.cu 2025-05-07T19:51:23.4444245Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_gwd_cuda.cu 2025-05-07T19:51:23.4444814Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_gwd_cuda.cu 2025-05-07T19:51:23.4445334Z Written: gen_embedding_optimizer_rowwise_adagrad_ssd_device_kernel.cuh 2025-05-07T19:51:23.4445856Z Written: gen_embedding_optimizer_rowwise_adagrad_split_device_kernel.cuh 2025-05-07T19:51:23.4446310Z Written: gen_embedding_backward_ssd_rowwise_adagrad.cpp 2025-05-07T19:51:23.4446729Z Written: gen_embedding_ssd_rowwise_adagrad_pt2_autograd.cpp 2025-05-07T19:51:23.4447215Z Written: gen_embedding_backward_ssd_rowwise_adagrad_pt2_cuda_wrapper.cpp 2025-05-07T19:51:23.4447635Z Written: lookup_rowwise_adagrad_ssd.py 2025-05-07T19:51:23.4448001Z Written: gen_embedding_backward_split_rowwise_adagrad.cpp 2025-05-07T19:51:23.4448420Z Written: gen_embedding_split_rowwise_adagrad_pt2_autograd.cpp 2025-05-07T19:51:23.4448916Z Written: gen_embedding_backward_split_rowwise_adagrad_pt2_cuda_wrapper.cpp 2025-05-07T19:51:23.4449335Z Written: lookup_rowwise_adagrad.py 2025-05-07T19:51:23.4449696Z Written: gen_embedding_backward_rowwise_adagrad_split_cpu.cpp 2025-05-07T19:51:23.4450128Z Written: gen_embedding_backward_split_rowwise_adagrad_cpu.cpp 2025-05-07T19:51:23.4450613Z Written: gen_embedding_backward_split_rowwise_adagrad_pt2_cpu_wrapper.cpp 2025-05-07T19:51:23.4451171Z Written: gen_embedding_optimizer_approx_rowwise_adagrad_split_device_kernel.cuh 2025-05-07T19:51:23.4451685Z Written: gen_embedding_backward_split_approx_rowwise_adagrad.cpp 2025-05-07T19:51:23.4452168Z Written: gen_embedding_split_approx_rowwise_adagrad_pt2_autograd.cpp 2025-05-07T19:51:23.4452691Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_pt2_cuda_wrapper.cpp 2025-05-07T19:51:23.4453238Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_cpu.cpp 2025-05-07T19:51:23.4453758Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_pt2_cpu_wrapper.cpp 2025-05-07T19:51:23.4454367Z Written: gen_embedding_optimizer_rowwise_adagrad_with_weight_decay_split_device_kernel.cuh 2025-05-07T19:51:23.4454968Z Written: gen_embedding_backward_split_rowwise_adagrad_with_weight_decay.cpp 2025-05-07T19:51:23.4455615Z Written: gen_embedding_split_rowwise_adagrad_with_weight_decay_pt2_autograd.cpp 2025-05-07T19:51:23.4456534Z Written: gen_embedding_backward_split_rowwise_adagrad_with_weight_decay_pt2_cuda_wrapper.cpp 2025-05-07T19:51:23.4457191Z Written: gen_embedding_backward_split_rowwise_adagrad_with_weight_decay_cpu.cpp 2025-05-07T19:51:23.4457865Z Written: gen_embedding_backward_split_rowwise_adagrad_with_weight_decay_pt2_cpu_wrapper.cpp 2025-05-07T19:51:23.4458613Z Written: gen_embedding_optimizer_approx_rowwise_adagrad_with_weight_decay_split_device_kernel.cuh 2025-05-07T19:51:23.4459415Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_weight_decay.cpp 2025-05-07T19:51:23.4460073Z Written: gen_embedding_split_approx_rowwise_adagrad_with_weight_decay_pt2_autograd.cpp 2025-05-07T19:51:23.4460849Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_weight_decay_pt2_cuda_wrapper.cpp 2025-05-07T19:51:23.4461573Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_weight_decay_cpu.cpp 2025-05-07T19:51:23.5452806Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_weight_decay_pt2_cpu_wrapper.cpp 2025-05-07T19:51:23.5453762Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_vbe_cuda.cu 2025-05-07T19:51:23.5454626Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_cuda.cu 2025-05-07T19:51:23.5455299Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:23.5456158Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_vbe_cuda.cu 2025-05-07T19:51:23.5456874Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_cuda.cu 2025-05-07T19:51:23.5457540Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_vbe_meta.cpp 2025-05-07T19:51:23.5458226Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_meta.cpp 2025-05-07T19:51:23.5458899Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:23.5459721Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_vbe_meta.cpp 2025-05-07T19:51:23.5460362Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_meta.cpp 2025-05-07T19:51:23.5461001Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_vbe_kernel_cta.cu 2025-05-07T19:51:23.5461663Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_kernel_cta.cu 2025-05-07T19:51:23.5462324Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:23.5463039Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_vbe_kernel_cta.cu 2025-05-07T19:51:23.5463717Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_kernel_cta.cu 2025-05-07T19:51:23.5464378Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_vbe_kernel_warp.cu 2025-05-07T19:51:23.5465049Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_kernel_warp.cu 2025-05-07T19:51:23.5465717Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:23.5466424Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_vbe_kernel_warp.cu 2025-05-07T19:51:23.5467112Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_kernel_warp.cu 2025-05-07T19:51:23.5467740Z Written: gen_embedding_optimizer_rowwise_adagrad_with_counter_split_device_kernel.cuh 2025-05-07T19:51:23.5468322Z Written: gen_embedding_backward_split_rowwise_adagrad_with_counter.cpp 2025-05-07T19:51:23.5468834Z Written: gen_embedding_split_rowwise_adagrad_with_counter_pt2_autograd.cpp 2025-05-07T19:51:23.5469426Z Written: gen_embedding_backward_split_rowwise_adagrad_with_counter_pt2_cuda_wrapper.cpp 2025-05-07T19:51:23.5469925Z Written: lookup_rowwise_adagrad_with_counter.py 2025-05-07T19:51:23.5470370Z Written: gen_embedding_backward_split_rowwise_adagrad_with_counter_cpu.cpp 2025-05-07T19:51:23.5470950Z Written: gen_embedding_backward_split_rowwise_adagrad_with_counter_pt2_cpu_wrapper.cpp 2025-05-07T19:51:23.5471571Z Written: gen_embedding_optimizer_approx_rowwise_adagrad_with_counter_split_device_kernel.cuh 2025-05-07T19:51:23.5472175Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_counter.cpp 2025-05-07T19:51:23.5472736Z Written: gen_embedding_split_approx_rowwise_adagrad_with_counter_pt2_autograd.cpp 2025-05-07T19:51:23.5473343Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_counter_pt2_cuda_wrapper.cpp 2025-05-07T19:51:23.5474232Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_counter_cpu.cpp 2025-05-07T19:51:23.5475421Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_counter_pt2_cpu_wrapper.cpp 2025-05-07T19:51:23.5476199Z Written: gen_embedding_optimizer_rowwise_weighted_adagrad_split_device_kernel.cuh 2025-05-07T19:51:23.5476752Z Written: gen_embedding_backward_split_rowwise_weighted_adagrad.cpp 2025-05-07T19:51:23.5477271Z Written: gen_embedding_split_rowwise_weighted_adagrad_pt2_autograd.cpp 2025-05-07T19:51:23.5477853Z Written: gen_embedding_backward_split_rowwise_weighted_adagrad_pt2_cuda_wrapper.cpp 2025-05-07T19:51:23.5478425Z Written: gen_embedding_backward_split_rowwise_weighted_adagrad_cpu.cpp 2025-05-07T19:51:23.5479002Z Written: gen_embedding_backward_split_rowwise_weighted_adagrad_pt2_cpu_wrapper.cpp 2025-05-07T19:51:23.5479538Z Written: gen_embedding_backward_sgd_split_weighted_vbe_cuda.cu 2025-05-07T19:51:23.5479992Z Written: gen_embedding_backward_sgd_split_weighted_cuda.cu 2025-05-07T19:51:23.5480462Z Written: gen_embedding_backward_sgd_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:23.5480942Z Written: gen_embedding_backward_sgd_split_unweighted_vbe_cuda.cu 2025-05-07T19:51:23.5481408Z Written: gen_embedding_backward_sgd_split_unweighted_cuda.cu 2025-05-07T19:51:23.5481951Z Written: gen_embedding_backward_sgd_split_weighted_vbe_meta.cpp 2025-05-07T19:51:23.5482387Z Written: gen_embedding_backward_sgd_split_weighted_meta.cpp 2025-05-07T19:51:23.5482818Z Written: gen_embedding_backward_sgd_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:23.5483277Z Written: gen_embedding_backward_sgd_split_unweighted_vbe_meta.cpp 2025-05-07T19:51:23.5483708Z Written: gen_embedding_backward_sgd_split_unweighted_meta.cpp 2025-05-07T19:51:23.5484160Z Written: gen_embedding_backward_sgd_split_weighted_vbe_kernel_cta.cu 2025-05-07T19:51:23.5484624Z Written: gen_embedding_backward_sgd_split_weighted_kernel_cta.cu 2025-05-07T19:51:23.5485124Z Written: gen_embedding_backward_sgd_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:23.5485629Z Written: gen_embedding_backward_sgd_split_unweighted_vbe_kernel_cta.cu 2025-05-07T19:51:23.5486108Z Written: gen_embedding_backward_sgd_split_unweighted_kernel_cta.cu 2025-05-07T19:51:23.5486574Z Written: gen_embedding_backward_sgd_split_weighted_vbe_kernel_warp.cu 2025-05-07T19:51:23.5487044Z Written: gen_embedding_backward_sgd_split_weighted_kernel_warp.cu 2025-05-07T19:51:23.5487520Z Written: gen_embedding_backward_sgd_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:23.5488026Z Written: gen_embedding_backward_sgd_split_unweighted_vbe_kernel_warp.cu 2025-05-07T19:51:23.5488503Z Written: gen_embedding_backward_sgd_split_unweighted_kernel_warp.cu 2025-05-07T19:51:23.5488943Z Written: gen_embedding_optimizer_sgd_split_device_kernel.cuh 2025-05-07T19:51:23.5489322Z Written: gen_embedding_backward_split_sgd.cpp 2025-05-07T19:51:23.5489655Z Written: gen_embedding_split_sgd_pt2_autograd.cpp 2025-05-07T19:51:23.5490049Z Written: gen_embedding_backward_split_sgd_pt2_cuda_wrapper.cpp 2025-05-07T19:51:23.5490402Z Written: lookup_sgd.py 2025-05-07T19:51:23.5490663Z Written: gen_embedding_backward_sgd_split_cpu.cpp 2025-05-07T19:51:23.5491003Z Written: gen_embedding_backward_split_sgd_cpu.cpp 2025-05-07T19:51:23.5491388Z Written: gen_embedding_backward_split_sgd_pt2_cpu_wrapper.cpp 2025-05-07T19:51:23.5491829Z Written: gen_embedding_optimizer_approx_sgd_split_device_kernel.cuh 2025-05-07T19:51:23.5492250Z Written: gen_embedding_backward_split_approx_sgd.cpp 2025-05-07T19:51:23.5492626Z Written: gen_embedding_split_approx_sgd_pt2_autograd.cpp 2025-05-07T19:51:23.5493051Z Written: gen_embedding_backward_split_approx_sgd_pt2_cuda_wrapper.cpp 2025-05-07T19:51:23.5493496Z Written: gen_embedding_backward_split_approx_sgd_cpu.cpp 2025-05-07T19:51:23.5493924Z Written: gen_embedding_backward_split_approx_sgd_pt2_cpu_wrapper.cpp 2025-05-07T19:51:23.5494376Z Written: gen_embedding_backward_none_split_weighted_cuda.cu 2025-05-07T19:51:23.5494899Z Written: gen_embedding_backward_none_split_unweighted_nobag_cuda.cu 2025-05-07T19:51:23.5495429Z Written: gen_embedding_backward_none_split_unweighted_cuda.cu 2025-05-07T19:51:23.5496069Z Written: gen_embedding_backward_none_split_weighted_meta.cpp 2025-05-07T19:51:23.5496692Z Written: gen_embedding_backward_none_split_unweighted_nobag_meta.cpp 2025-05-07T19:51:23.5497191Z Written: gen_embedding_backward_none_split_unweighted_meta.cpp 2025-05-07T19:51:23.5497660Z Written: gen_embedding_backward_none_split_weighted_kernel_cta.cu 2025-05-07T19:51:23.5498184Z Written: gen_embedding_backward_none_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:51:23.5498709Z Written: gen_embedding_backward_none_split_unweighted_kernel_cta.cu 2025-05-07T19:51:23.5499205Z Written: gen_embedding_backward_none_split_weighted_kernel_warp.cu 2025-05-07T19:51:23.5499741Z Written: gen_embedding_backward_none_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:51:23.5500280Z Written: gen_embedding_backward_none_split_unweighted_kernel_warp.cu 2025-05-07T19:51:23.5500766Z Written: gen_embedding_optimizer_none_split_device_kernel.cuh 2025-05-07T19:51:23.5501171Z Written: gen_embedding_backward_split_none.cpp 2025-05-07T19:51:23.5501541Z Written: gen_embedding_split_none_pt2_autograd.cpp 2025-05-07T19:51:23.5501970Z Written: gen_embedding_backward_split_none_pt2_cuda_wrapper.cpp 2025-05-07T19:51:23.5502452Z Written: lookup_none.py 2025-05-07T19:51:23.5502712Z Written: gen_embedding_backward_split_none_cpu.cpp 2025-05-07T19:51:23.5503105Z Written: gen_embedding_backward_split_none_pt2_cpu_wrapper.cpp 2025-05-07T19:51:23.5503560Z Written: gen_embedding_backward_split_weighted_device_kernel_hip.hip 2025-05-07T19:51:23.5504061Z Written: gen_embedding_backward_split_unweighted_nobag_device_kernel_hip.hip 2025-05-07T19:51:23.5504575Z Written: gen_embedding_backward_split_unweighted_device_kernel_hip.hip 2025-05-07T19:51:23.5505042Z Written: gen_embedding_backward_ssd_weighted_vbe_device_kernel.cuh 2025-05-07T19:51:23.5505511Z Written: gen_embedding_backward_split_weighted_vbe_device_kernel.cuh 2025-05-07T19:51:23.5505958Z Written: gen_embedding_backward_ssd_weighted_device_kernel.cuh 2025-05-07T19:51:23.5506399Z Written: gen_embedding_backward_split_weighted_device_kernel.cuh 2025-05-07T19:51:23.5506863Z Written: gen_embedding_backward_ssd_unweighted_nobag_device_kernel.cuh 2025-05-07T19:51:23.5507359Z Written: gen_embedding_backward_split_unweighted_nobag_device_kernel.cuh 2025-05-07T19:51:23.5507852Z Written: gen_embedding_backward_ssd_unweighted_vbe_device_kernel.cuh 2025-05-07T19:51:23.5508328Z Written: gen_embedding_backward_split_unweighted_vbe_device_kernel.cuh 2025-05-07T19:51:23.5508800Z Written: gen_embedding_backward_ssd_unweighted_device_kernel.cuh 2025-05-07T19:51:23.5509240Z Written: gen_embedding_backward_split_unweighted_device_kernel.cuh 2025-05-07T19:51:23.5509687Z Written: gen_embedding_backward_split_common_device_kernel.cuh 2025-05-07T19:51:23.5510109Z Written: gen_embedding_backward_split_grad_embedding_ops.cu 2025-05-07T19:51:23.5510548Z Written: gen_embedding_backward_dense_indice_weights_codegen_cuda.cu 2025-05-07T19:51:23.5511007Z Written: gen_embedding_backward_ssd_indice_weights_codegen_cuda.cu 2025-05-07T19:51:23.5511461Z Written: gen_embedding_backward_split_indice_weights_codegen_cuda.cu 2025-05-07T19:51:23.5511828Z Written: pt2_arg_utils.h 2025-05-07T19:51:23.5512041Z Written: __init__.py 2025-05-07T19:51:23.5512262Z Written: lookup_args_ssd.py 2025-05-07T19:51:23.5512503Z Written: lookup_args.py 2025-05-07T19:51:23.5593553Z 2025-05-07T19:51:23.5593649Z 2025-05-07T19:51:23.5594184Z ================================================================================ 2025-05-07T19:51:23.5595289Z Running code generation script ... 2025-05-07T19:51:23.5597592Z /github/home/miniconda/envs/build_binary/bin/python /__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_embedding_optimizer.py --opensource 2025-05-07T19:51:23.5599962Z ================================================================================ 2025-05-07T19:51:23.5601066Z 2025-05-07T19:51:23.6674259Z [ARGS PARSE] Parsed arguments: Namespace(install_dir='.', is_fbcode=False, is_rocm=False) 2025-05-07T19:51:23.6675790Z [GENERATE OPTIMIZERS]: ['/__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_embedding_optimizer.py', '--opensource'] 2025-05-07T19:51:23.6676572Z Written: gen_embedding_optimizer_rowwise_adagrad_split_cuda.cu 2025-05-07T19:51:23.6677059Z Written: gen_embedding_optimizer_rowwise_adagrad_split_kernel.cu 2025-05-07T19:51:23.6677551Z Written: gen_embedding_optimizer_rowwise_adagrad_split.cpp 2025-05-07T19:51:23.6678069Z Written: gen_embedding_optimizer_rowwise_adagrad_split_device_kernel.cuh 2025-05-07T19:51:23.6678559Z Written: split_embedding_optimizer_rowwise_adagrad.py 2025-05-07T19:51:23.6679158Z Written: optimizer_args.py 2025-05-07T19:51:23.6780746Z 2025-05-07T19:51:23.6780915Z 2025-05-07T19:51:23.6781182Z ================================================================================ 2025-05-07T19:51:23.6781828Z Running code generation script ... 2025-05-07T19:51:23.6782670Z /github/home/miniconda/envs/build_binary/bin/python /__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_forward_quantized.py --opensource 2025-05-07T19:51:23.6783486Z ================================================================================ 2025-05-07T19:51:23.6783742Z 2025-05-07T19:51:23.7999365Z [ARGS PARSE] Parsed arguments: Namespace(install_dir='.', is_fbcode=False, is_rocm=False) 2025-05-07T19:51:23.8001997Z [GENERATE FORWARD QUANTIZED]: ['/__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_forward_quantized.py', '--opensource'] 2025-05-07T19:51:23.8004496Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_fp32_codegen_cuda.cu 2025-05-07T19:51:23.8006495Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_fp16_codegen_cuda.cu 2025-05-07T19:51:23.8008433Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_fp8_codegen_cuda.cu 2025-05-07T19:51:23.8010450Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_int8_codegen_cuda.cu 2025-05-07T19:51:23.8012126Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_int4_codegen_cuda.cu 2025-05-07T19:51:23.8012764Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_int2_codegen_cuda.cu 2025-05-07T19:51:23.8013440Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_fp32_codegen_cuda.cu 2025-05-07T19:51:23.8014127Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_fp16_codegen_cuda.cu 2025-05-07T19:51:23.8014832Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_fp8_codegen_cuda.cu 2025-05-07T19:51:23.8015662Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_int8_codegen_cuda.cu 2025-05-07T19:51:23.8016575Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_int4_codegen_cuda.cu 2025-05-07T19:51:23.8017329Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_int2_codegen_cuda.cu 2025-05-07T19:51:23.8018051Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_fp32_codegen_cuda.cu 2025-05-07T19:51:23.8018763Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_fp16_codegen_cuda.cu 2025-05-07T19:51:23.8019479Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_fp8_codegen_cuda.cu 2025-05-07T19:51:23.8020173Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_int8_codegen_cuda.cu 2025-05-07T19:51:23.8020915Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_int4_codegen_cuda.cu 2025-05-07T19:51:23.8021615Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_int2_codegen_cuda.cu 2025-05-07T19:51:23.8022411Z Written: gen_embedding_forward_quantized_split_nbit_host_weighted_codegen_cuda.cu 2025-05-07T19:51:23.8023050Z Written: gen_embedding_forward_quantized_split_nbit_host_unweighted_nobag_codegen_cuda.cu 2025-05-07T19:51:23.8023949Z Written: gen_embedding_forward_quantized_split_nbit_host_unweighted_codegen_cuda.cu 2025-05-07T19:51:23.8024520Z Written: gen_embedding_forward_quantized_weighted_codegen_cpu.cpp 2025-05-07T19:51:23.8025005Z Written: gen_embedding_forward_quantized_unweighted_codegen_cpu.cpp 2025-05-07T19:51:23.8107956Z 2025-05-07T19:51:23.8108452Z 2025-05-07T19:51:23.8108998Z ================================================================================ 2025-05-07T19:51:23.8110086Z Running code generation script ... 2025-05-07T19:51:23.8112344Z /github/home/miniconda/envs/build_binary/bin/python /__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_forward_split.py --opensource 2025-05-07T19:51:23.8113456Z ================================================================================ 2025-05-07T19:51:23.8113684Z 2025-05-07T19:51:24.1524125Z [ARGS PARSE] Parsed arguments: Namespace(install_dir='.', is_fbcode=False, is_rocm=False) 2025-05-07T19:51:24.1525302Z [GENERATE FORWARD SPLIT]: ['/__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_forward_split.py', '--opensource'] 2025-05-07T19:51:24.1526223Z Written: gen_embedding_forward_dense_weighted_vbe_codegen_cuda.cu 2025-05-07T19:51:24.1526729Z Written: gen_embedding_forward_dense_weighted_codegen_cuda.cu 2025-05-07T19:51:24.1527226Z Written: gen_embedding_forward_dense_unweighted_vbe_codegen_cuda.cu 2025-05-07T19:51:24.1527740Z Written: gen_embedding_forward_dense_unweighted_codegen_cuda.cu 2025-05-07T19:51:24.1528220Z Written: gen_embedding_forward_ssd_weighted_vbe_codegen_cuda.cu 2025-05-07T19:51:24.1528720Z Written: gen_embedding_forward_split_weighted_vbe_codegen_cuda.cu 2025-05-07T19:51:24.1529209Z Written: gen_embedding_forward_ssd_weighted_codegen_cuda.cu 2025-05-07T19:51:24.1529665Z Written: gen_embedding_forward_split_weighted_codegen_cuda.cu 2025-05-07T19:51:24.1530160Z Written: gen_embedding_forward_ssd_unweighted_vbe_codegen_cuda.cu 2025-05-07T19:51:24.1530775Z Written: gen_embedding_forward_split_unweighted_vbe_codegen_cuda.cu 2025-05-07T19:51:24.1531277Z Written: gen_embedding_forward_ssd_unweighted_codegen_cuda.cu 2025-05-07T19:51:24.1531846Z Written: gen_embedding_forward_split_unweighted_codegen_cuda.cu 2025-05-07T19:51:24.1532342Z Written: gen_embedding_forward_split_weighted_vbe_gwd_codegen_cuda.cu 2025-05-07T19:51:24.1532866Z Written: gen_embedding_forward_split_weighted_gwd_codegen_cuda.cu 2025-05-07T19:51:24.1533349Z Written: gen_embedding_forward_split_unweighted_vbe_gwd_codegen_cuda.cu 2025-05-07T19:51:24.1533854Z Written: gen_embedding_forward_split_unweighted_gwd_codegen_cuda.cu 2025-05-07T19:51:24.1534340Z Written: gen_embedding_forward_dense_weighted_vbe_codegen_meta.cpp 2025-05-07T19:51:24.1534792Z Written: gen_embedding_forward_dense_weighted_codegen_meta.cpp 2025-05-07T19:51:24.1535265Z Written: gen_embedding_forward_dense_unweighted_vbe_codegen_meta.cpp 2025-05-07T19:51:24.1536063Z Written: gen_embedding_forward_dense_unweighted_codegen_meta.cpp 2025-05-07T19:51:24.1536570Z Written: gen_embedding_forward_ssd_weighted_vbe_codegen_meta.cpp 2025-05-07T19:51:24.1537069Z Written: gen_embedding_forward_split_weighted_vbe_codegen_meta.cpp 2025-05-07T19:51:24.1537567Z Written: gen_embedding_forward_ssd_weighted_codegen_meta.cpp 2025-05-07T19:51:24.1538037Z Written: gen_embedding_forward_split_weighted_codegen_meta.cpp 2025-05-07T19:51:24.1538546Z Written: gen_embedding_forward_ssd_unweighted_vbe_codegen_meta.cpp 2025-05-07T19:51:24.1539074Z Written: gen_embedding_forward_split_unweighted_vbe_codegen_meta.cpp 2025-05-07T19:51:24.1539571Z Written: gen_embedding_forward_ssd_unweighted_codegen_meta.cpp 2025-05-07T19:51:24.1540064Z Written: gen_embedding_forward_split_unweighted_codegen_meta.cpp 2025-05-07T19:51:24.1540530Z Written: gen_embedding_forward_dense_weighted_vbe_kernel.cu 2025-05-07T19:51:24.1540979Z Written: gen_embedding_forward_dense_weighted_kernel.cu 2025-05-07T19:51:24.1541428Z Written: gen_embedding_forward_dense_unweighted_nobag_kernel.cu 2025-05-07T19:51:24.1541918Z Written: gen_embedding_forward_dense_unweighted_vbe_kernel.cu 2025-05-07T19:51:24.1542733Z Written: gen_embedding_forward_dense_unweighted_kernel.cu 2025-05-07T19:51:24.1543137Z Written: gen_embedding_forward_ssd_weighted_vbe_kernel.cu 2025-05-07T19:51:24.1543565Z Written: gen_embedding_forward_split_weighted_vbe_kernel.cu 2025-05-07T19:51:24.1544065Z Written: gen_embedding_forward_ssd_weighted_kernel.cu 2025-05-07T19:51:24.1544479Z Written: gen_embedding_forward_split_weighted_kernel.cu 2025-05-07T19:51:24.1544889Z Written: gen_embedding_forward_ssd_unweighted_nobag_kernel.cu 2025-05-07T19:51:24.1545347Z Written: gen_embedding_forward_split_unweighted_nobag_kernel.cu 2025-05-07T19:51:24.1545801Z Written: gen_embedding_forward_ssd_unweighted_vbe_kernel.cu 2025-05-07T19:51:24.1546226Z Written: gen_embedding_forward_split_unweighted_vbe_kernel.cu 2025-05-07T19:51:24.1546656Z Written: gen_embedding_forward_ssd_unweighted_kernel.cu 2025-05-07T19:51:24.1547051Z Written: gen_embedding_forward_split_unweighted_kernel.cu 2025-05-07T19:51:24.1547490Z Written: gen_embedding_forward_split_weighted_vbe_gwd_kernel.cu 2025-05-07T19:51:24.1547923Z Written: gen_embedding_forward_split_weighted_gwd_kernel.cu 2025-05-07T19:51:24.1548373Z Written: gen_embedding_forward_split_unweighted_vbe_gwd_kernel.cu 2025-05-07T19:51:24.1548841Z Written: gen_embedding_forward_split_unweighted_gwd_kernel.cu 2025-05-07T19:51:24.1549256Z Written: gen_embedding_forward_split_weighted_v2_kernel.cu 2025-05-07T19:51:24.1549670Z Written: gen_embedding_forward_split_unweighted_v2_kernel.cu 2025-05-07T19:51:24.1550114Z Written: gen_embedding_forward_dense_unweighted_nobag_kernel_small.cu 2025-05-07T19:51:24.1550606Z Written: gen_embedding_forward_dense_unweighted_nobag_kernel_small.cu 2025-05-07T19:51:24.1551087Z Written: gen_embedding_forward_ssd_unweighted_nobag_kernel_small.cu 2025-05-07T19:51:24.1551556Z Written: gen_embedding_forward_split_unweighted_nobag_kernel_small.cu 2025-05-07T19:51:24.1552015Z Written: gen_embedding_forward_split_pt2_cuda_wrapper.cpp 2025-05-07T19:51:24.1552415Z Written: gen_embedding_forward_split_pt2_cpu_wrapper.cpp 2025-05-07T19:51:24.1552809Z Written: gen_embedding_forward_ssd_pt2_cuda_wrapper.cpp 2025-05-07T19:51:24.1656843Z 2025-05-07T19:51:24.1657415Z 2025-05-07T19:51:24.1657892Z ================================================================================ 2025-05-07T19:51:24.1658729Z Running code generation script ... 2025-05-07T19:51:24.1659491Z /github/home/miniconda/envs/build_binary/bin/python /__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_index_select.py --opensource 2025-05-07T19:51:24.1660279Z ================================================================================ 2025-05-07T19:51:24.1660513Z 2025-05-07T19:51:24.4324509Z [ARGS PARSE] Parsed arguments: Namespace(install_dir='.', is_fbcode=False, is_rocm=False) 2025-05-07T19:51:24.4326944Z [INDEX SELECT GENERATOR]: ['/__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_index_select.py', '--opensource'] 2025-05-07T19:51:24.4328987Z Written: gen_batch_index_select_dim0_forward_codegen_cuda.cu 2025-05-07T19:51:24.4330375Z Written: gen_batch_index_select_dim0_forward_kernel.cu 2025-05-07T19:51:24.4330768Z Written: gen_batch_index_select_dim0_forward_kernel_small.cu 2025-05-07T19:51:24.4331196Z Written: gen_batch_index_select_dim0_backward_codegen_cuda.cu 2025-05-07T19:51:24.4331614Z Written: gen_batch_index_select_dim0_backward_kernel_cta.cu 2025-05-07T19:51:24.4332025Z Written: gen_batch_index_select_dim0_backward_kernel_warp.cu 2025-05-07T19:51:24.4332483Z Written: gen_embedding_backward_split_batch_index_select_device_kernel.cuh 2025-05-07T19:51:24.4332950Z Written: gen_embedding_backward_split_grad_index_select.cu 2025-05-07T19:51:24.4333368Z Written: gen_embedding_backward_split_common_device_kernel.cuh 2025-05-07T19:51:24.4533670Z 2025-05-07T19:51:24.4533787Z 2025-05-07T19:51:24.4534293Z ================================================================================ 2025-05-07T19:51:24.4535907Z GPU CPP Library Target: fbgemm_gpu_experimental_gen_ai (SHARED) 2025-05-07T19:51:24.4537610Z 2025-05-07T19:51:24.4538208Z CPU_SRCS: 2025-05-07T19:51:24.4538592Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/attention.cpp 2025-05-07T19:51:24.4539202Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cpp 2025-05-07T19:51:24.4539884Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cpp 2025-05-07T19:51:24.4540437Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cpp 2025-05-07T19:51:24.4541057Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cpp 2025-05-07T19:51:24.4541721Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cpp 2025-05-07T19:51:24.4542296Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cpp 2025-05-07T19:51:24.4542733Z 2025-05-07T19:51:24.4542914Z GPU_SRCS: 2025-05-07T19:51:24.4543309Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/gqa_attn_splitk.cu 2025-05-07T19:51:24.4543904Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cu 2025-05-07T19:51:24.4544497Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu 2025-05-07T19:51:24.4545055Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cu 2025-05-07T19:51:24.4545649Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu 2025-05-07T19:51:24.4546290Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cu 2025-05-07T19:51:24.4546858Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cu 2025-05-07T19:51:24.4547282Z 2025-05-07T19:51:24.4547477Z CUDA_SPECIFIC_SRCS: 2025-05-07T19:51:24.4547988Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu 2025-05-07T19:51:24.4548796Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16.cu 2025-05-07T19:51:24.4549623Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu 2025-05-07T19:51:24.4550539Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu 2025-05-07T19:51:24.4551354Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16.cu 2025-05-07T19:51:24.4552215Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T19:51:24.4553170Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T19:51:24.4554107Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T19:51:24.4555053Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T19:51:24.4555989Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T19:51:24.4556932Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T19:51:24.4557879Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T19:51:24.4558809Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T19:51:24.4559751Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T19:51:24.4560795Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T19:51:24.4561714Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T19:51:24.4562636Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T19:51:24.4563753Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T19:51:24.4564739Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T19:51:24.4565649Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T19:51:24.4566564Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T19:51:24.4567482Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T19:51:24.4568382Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T19:51:24.4569304Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T19:51:24.4570236Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T19:51:24.4571150Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T19:51:24.4572069Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T19:51:24.4572974Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T19:51:24.4574073Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T19:51:24.4575217Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16.cu 2025-05-07T19:51:24.4576082Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu 2025-05-07T19:51:24.4576914Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu 2025-05-07T19:51:24.4577694Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_lite.cu 2025-05-07T19:51:24.4578504Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu 2025-05-07T19:51:24.4579486Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu 2025-05-07T19:51:24.4580618Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu 2025-05-07T19:51:24.4581757Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu 2025-05-07T19:51:24.4582884Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu 2025-05-07T19:51:24.4584022Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu 2025-05-07T19:51:24.4585411Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu 2025-05-07T19:51:24.4586679Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu 2025-05-07T19:51:24.4587809Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu 2025-05-07T19:51:24.4588935Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu 2025-05-07T19:51:24.4590221Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu 2025-05-07T19:51:24.4591737Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu 2025-05-07T19:51:24.4593016Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu 2025-05-07T19:51:24.4594151Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu 2025-05-07T19:51:24.4595276Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu 2025-05-07T19:51:24.4596363Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu 2025-05-07T19:51:24.4597211Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu 2025-05-07T19:51:24.4598021Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu 2025-05-07T19:51:24.4598806Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu 2025-05-07T19:51:24.4599644Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu 2025-05-07T19:51:24.4600426Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16.cu 2025-05-07T19:51:24.4601178Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu 2025-05-07T19:51:24.4601974Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/mixed_dtype_utils.cu 2025-05-07T19:51:24.4602702Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16_fast_gemv.cu 2025-05-07T19:51:24.4603416Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu 2025-05-07T19:51:24.4604153Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu 2025-05-07T19:51:24.4604879Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv.cu 2025-05-07T19:51:24.4605591Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv.cuh 2025-05-07T19:51:24.4606306Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/utility.cuh 2025-05-07T19:51:24.4606795Z 2025-05-07T19:51:24.4606972Z HIP_SPECIFIC_SRCS: 2025-05-07T19:51:24.4607346Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gemm/ck_extensions.hip 2025-05-07T19:51:24.4607884Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gemm/gemm.cpp 2025-05-07T19:51:24.4608580Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/bf16_grouped_gemm.hip 2025-05-07T19:51:24.4609748Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x32x128_16x16_1x1_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4611192Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x32x64_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_intrawave_v2.hip 2025-05-07T19:51:24.4612638Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x32x64_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:24.4614067Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x64x128_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_interwave_v1.hip 2025-05-07T19:51:24.4615587Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x64x128_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_interwave_v2.hip 2025-05-07T19:51:24.4617258Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x64x128_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:51:24.4618882Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x96x128_16x16_1x3_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:51:24.4620385Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x96x128_16x16_1x3_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4621870Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x96x128_16x16_1x3_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:24.4623347Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x96x128_16x16_1x3_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:24.4624830Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x96x64_16x16_1x3_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:24.4626301Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x32x16x64_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4627774Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x32x64x128_32x32_1x1_16x8x1_16x8x1_1x16x1x8_8x8x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4629297Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x32x64x128_32x32_1x1_16x8x1_16x8x1_1x16x1x8_8x8x1_1x1_intrawave_v1.hip 2025-05-07T19:51:24.4630671Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x32x96x128_16x16_2x3_16x8x1_16x8x1_1x32x1x4_8x8x1_2x1_intrawave_v2.hip 2025-05-07T19:51:24.4632038Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x64x128x64_32x32_2x2_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4633407Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x64x96x64_16x16_4x3_8x16x1_8x16x1_1x32x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:24.4634793Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x128x128_32x32_2x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4636175Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x128x64_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:51:24.4637563Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x128x64_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4638943Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x224x64_16x16_4x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:24.4640326Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x256x64_32x32_4x2_8x32x1_8x32x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4641703Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x96x64_16x16_4x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:24.4643092Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x16x128x128_16x16_1x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:51:24.4644480Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x16x128x128_16x16_1x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:51:24.4645972Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x16x64x128_16x16_1x1_16x16x1_16x16x1_1x16x1x16_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4647361Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x224x256x32_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:24.4648722Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x256x128x32_32x32_4x2_4x64x1_4x64x1_1x32x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:51:24.4650106Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x256x160x64_16x16_8x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:24.4651504Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x256x192x64_32x32_4x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4652872Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x256x224x64_16x16_8x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:24.4654255Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x256x256x64_32x32_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4655708Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x32x128x128_16x16_1x4_16x16x1_16x16x1_1x32x1x8_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:51:24.4657377Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x32x224x64_16x16_1x7_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:24.4658860Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x32x96x64_16x16_1x3_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:51:24.4660345Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x32x96x64_16x16_1x3_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4661833Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x64x128x128_32x32_2x1_16x16x1_16x16x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4663342Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x64x192x128_16x16_4x3_16x16x1_16x16x1_1x32x1x8_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:24.4664836Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x64x96x64_16x16_2x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:24.4666301Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x16x128_16x16_1x1_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4667782Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x16x128_16x16_1x1_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:24.4669280Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x16x64_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4670634Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x32x128_16x16_1x2_16x4x1_16x4x1_1x16x1x4_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:51:24.4672154Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x48x128_16x16_1x3_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:24.4675149Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x64x128_16x16_1x4_16x4x1_16x4x1_1x16x1x4_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:51:24.4676277Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/ck_utility.hip 2025-05-07T19:51:24.4677044Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_blockwise_gemm.hip 2025-05-07T19:51:24.4677879Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/fp8_rowwise_gemm.hip 2025-05-07T19:51:24.4679074Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x128x16x128_16x16_4x1_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4680546Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x128x32x128_32x32_2x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:24.4682007Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4683514Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2_4_split_k.hip 2025-05-07T19:51:24.4685050Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2_8_split_k.hip 2025-05-07T19:51:24.4686543Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:24.4688091Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:24.4689460Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2_8_split_k.hip 2025-05-07T19:51:24.4690849Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x256_16x16_1x1_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:24.4692205Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4693582Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_interwave_v2_2_split_k.hip 2025-05-07T19:51:24.4694976Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:24.4696755Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_intrawave_v2_2_split_k.hip 2025-05-07T19:51:24.4698253Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x512_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4699732Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x128x128_32x32_1x2_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4701205Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x16x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_interwave_v1.hip 2025-05-07T19:51:24.4702777Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x16x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4704309Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x16x256_16x16_1x1_16x8x1_16x8x1_1x32x1x4_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:51:24.4705773Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x16x512_16x16_1x1_32x4x1_32x4x1_1x32x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4707231Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x16x512_16x16_1x1_32x4x1_32x4x1_1x32x1x4_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:24.4708771Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x64x128_32x32_1x1_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4710150Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x64x128_32x32_1x1_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_intrawave_v2.hip 2025-05-07T19:51:24.4711514Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x64x32x128_32x32_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4712880Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x64x32x128_32x32_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:24.4714243Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x128_16x16_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:24.4715624Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:51:24.4717023Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4718388Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v5.hip 2025-05-07T19:51:24.4719781Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x256_32x32_2x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4721175Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x64_32x32_2x2_4x64x1_4x64x1_1x32x1x8_8x8x1_1x1_intrawave_v4.hip 2025-05-07T19:51:24.4722542Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x160x128_16x16_4x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:24.4723936Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x160x128_32x32_1x5_8x32x1_8x32x1_1x64x1x4_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4725331Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x192x128_32x32_2x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4726705Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x256x128_32x32_2x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4728096Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x64x128_32x32_2x1_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4729552Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x64x256_32x32_2x1_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4730965Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x96x128_16x16_4x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:24.4732363Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x96x256_32x32_1x3_16x16x1_16x16x1_1x64x1x4_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4733752Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x160x128x128_16x16_5x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:24.4735127Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x160x256x128_16x16_5x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:24.4736814Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x160x96x128_16x16_5x3_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4738337Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x16x64x128_16x16_1x1_16x16x1_8x32x1_1x16x1x16_4x4x1_1x1_intrawave_v2_8_split_k.hip 2025-05-07T19:51:24.4739875Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x16x64x512_16x16_1x1_32x8x1_32x8x1_1x16x1x16_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:24.4741360Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x16x64x512_16x16_1x1_32x8x1_32x8x1_1x16x1x16_4x4x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4742832Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x192x128x128_16x16_6x4_8x32x1_8x32x1_1x32x1x8_8x8x1_2x2_intrawave_v3.hip 2025-05-07T19:51:24.4744326Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x192x192x128_16x16_6x6_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:24.4745814Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x192x224x128_16x16_6x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:24.4747281Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x192x256x128_16x16_6x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:24.4748833Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x192x256x128_16x16_6x8_8x32x1_8x32x1_1x32x1x8_8x8x1_2x2_intrawave_v3.hip 2025-05-07T19:51:24.4750221Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x224x160x128_16x16_7x5_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4751593Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x224x192x128_16x16_7x6_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:24.4752969Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x224x256x128_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:24.4754346Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x128x128_16x16_8x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:24.4755707Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x128x128_32x32_4x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4757140Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x160x128_16x16_8x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:24.4758575Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x192x128_16x16_8x6_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:24.4759952Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x192x128_32x32_4x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4761336Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x224x128_16x16_8x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:24.4762714Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x256x128_16x16_8x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:24.4764086Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x256x64_16x16_8x8_4x64x1_4x64x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:24.4765472Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x256x64_32x32_4x4_4x64x1_4x64x1_1x32x1x8_8x8x1_1x1_intrawave_v4.hip 2025-05-07T19:51:24.4766838Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x96x128_16x16_8x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:24.4768191Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x96x128_32x32_2x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:24.4769569Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x32x128x256_32x32_1x1_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4771090Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x32x64x512_16x16_1x2_32x8x1_32x8x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:24.4772439Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x128x128_32x32_1x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4773812Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x128x256_32x32_1x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4775617Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x16x512_16x16_1x1_32x8x1_32x8x1_1x64x1x4_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:24.4777084Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x192x128_32x32_1x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4778582Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x192x256_32x32_1x3_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4780075Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x256x128_32x32_1x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4781539Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x64x128_32x32_1x1_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4783020Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x64x512_32x32_1x1_32x8x1_32x8x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4784602Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x96x256_16x16_2x3_16x16x1_16x16x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:24.4786153Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x80x128x256_16x16_5x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:24.4787659Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x96x128x128_16x16_3x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:24.4789173Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x128_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:51:24.4790520Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x128_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4791880Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x256_16x16_1x1_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:24.4793229Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x256_16x16_1x1_16x4x1_16x4x1_1x4x1x16_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:24.4794592Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x512_16x16_1x1_32x2x1_32x2x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4795940Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x512_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4797269Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x64_16x16_1x1_4x16x1_4x16x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4798424Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/fp8_rowwise_batched_gemm.hip 2025-05-07T19:51:24.4799656Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x256_16x16_1x1_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:51:24.4801138Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x256_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:51:24.4802618Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x256_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4804104Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x256_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:24.4805568Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x256_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:24.4807051Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:24.4808534Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x512_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:24.4809998Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x512_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:24.4811542Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x32x128x128_32x32_1x2_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4813067Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x32x64x128_32x32_1x1_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_intrawave_v2.hip 2025-05-07T19:51:24.4814546Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4816295Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v4.hip 2025-05-07T19:51:24.4817913Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v5.hip 2025-05-07T19:51:24.4819527Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x128x256_32x32_2x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4821150Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x160x128_32x32_1x5_8x32x1_8x32x1_1x64x1x4_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4822772Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x192x128_32x32_2x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4824373Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x256x128_32x32_2x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4825996Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x64x128_32x32_2x1_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4827631Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x96x256_32x32_1x3_16x16x1_16x16x1_1x64x1x4_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4829268Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x16x64x512_16x16_1x1_32x8x1_32x8x1_1x16x1x16_4x4x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4830765Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x224x256x128_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:24.4832276Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x256x128x128_16x16_8x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:24.4833769Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x256x160x128_16x16_8x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:24.4835268Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x256x192x128_16x16_8x6_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:24.4836770Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x256x224x128_16x16_8x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:24.4838258Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x256x256x128_16x16_8x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:24.4839826Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x32x128x256_32x32_1x1_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4841362Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x32x64x512_16x16_1x2_32x8x1_32x8x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:24.4842843Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x64x128x256_32x32_1x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4844346Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x64x192x256_32x32_1x3_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4845843Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x64x64x128_32x32_1x1_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4847320Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x64x64x512_32x32_1x1_32x8x1_32x8x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4848802Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_64x16x16x512_16x16_1x1_32x2x1_32x2x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4850277Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_64x16x16x512_16x16_1x1_32x2x1_32x2x1_1x16x1x4_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:24.4851717Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_64x16x16x512_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4_1x1_interwave_v1.hip 2025-05-07T19:51:24.4853158Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_64x16x16x512_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4_1x1_interwave_v2.hip 2025-05-07T19:51:24.4854361Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/fp8_rowwise_grouped_gemm.hip 2025-05-07T19:51:24.4855621Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_intrawave_v2.hip 2025-05-07T19:51:24.4857401Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:24.4859010Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x256_16x16_1x1_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4860615Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4862234Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:24.4863845Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:24.4865438Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x64x256_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_interwave_v1.hip 2025-05-07T19:51:24.4867051Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x64x256_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_interwave_v2.hip 2025-05-07T19:51:24.4868828Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x64x256_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:51:24.4870313Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x96x256_16x16_1x3_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:24.4871801Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x32x16x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4873294Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x32x64x256_16x16_1x4_16x8x1_16x8x1_1x32x1x4_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:51:24.4874915Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x32x64x256_32x32_1x1_16x8x1_16x8x1_1x16x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:51:24.4876688Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x64x64x256_32x32_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:24.4878254Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x64x64x256_32x32_2x1_16x8x1_16x8x1_1x16x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4879808Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:51:24.4881397Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4882999Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x128x256_32x32_2x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4884577Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x224x128_16x16_4x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:24.4886167Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x256x128_32x32_4x2_8x32x1_8x32x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4887832Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x96x128_16x16_4x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:24.4889322Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x128x256_16x16_1x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:51:24.4890835Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x128x256_16x16_1x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:51:24.4892342Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x128x256_16x16_1x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:24.4893822Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x64x256_16x16_1x1_16x16x1_16x16x1_1x16x1x16_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4895450Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x64x256_16x16_1x1_16x16x1_16x16x1_1x16x1x16_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:24.4897313Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x64x256_16x16_1x1_16x16x1_16x16x1_1x16x1x16_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:24.4898917Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x64x512_16x16_1x1_32x8x1_32x8x1_1x16x1x16_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:24.4900528Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x192x96x128_16x16_6x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:24.4902141Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x224x256x128_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:51:24.4903755Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x128x64_32x32_4x2_4x64x1_4x64x1_1x32x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:51:24.4905371Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x160x128_32x32_2x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:24.4906995Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x192x128_32x32_4x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4908674Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x224x128_16x16_8x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:24.4910178Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x256x128_32x32_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4911680Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x256x128_32x32_8x2_8x32x1_8x32x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4913160Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x128x128_16x16_1x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_interwave_v2.hip 2025-05-07T19:51:24.4914650Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x160x128_16x16_1x5_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4916143Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x160x128_16x16_1x5_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:24.4917635Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x256x128_16x16_1x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:51:24.4919127Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x32x512_16x16_1x1_32x8x1_32x8x1_1x32x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4920616Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x32x512_16x16_1x1_32x8x1_32x8x1_1x32x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:51:24.4922084Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x64x512_16x16_2x1_32x8x1_32x8x1_1x32x1x8_8x8x1_2x1_intrawave_v2.hip 2025-05-07T19:51:24.4923653Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x64x128x256_32x32_1x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4925219Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x64x128x256_32x32_2x1_16x16x1_16x16x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:51:24.4926709Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x64x160x128_16x16_2x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:24.4928196Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x64x192x128_16x16_4x3_8x32x1_8x32x1_1x32x1x8_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:51:24.4929678Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x16x128_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4931137Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x16x256_16x16_1x1_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:51:24.4932606Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x16x256_16x16_1x1_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:51:24.4934088Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x32x256_16x16_1x2_16x4x1_16x4x1_1x16x1x4_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:51:24.4935602Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x64x256_16x16_1x4_16x4x1_16x4x1_1x16x1x4_8x8x1_1x2_interwave_v1.hip 2025-05-07T19:51:24.4937365Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x64x256_16x16_1x4_16x4x1_16x4x1_1x16x1x4_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:51:24.4938576Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_tensorwise_gemm.hip 2025-05-07T19:51:24.4939111Z 2025-05-07T19:51:24.4939320Z OTHER_SRCS: 2025-05-07T19:51:24.4939441Z 2025-05-07T19:51:24.4939523Z 2025-05-07T19:51:24.4939723Z CC_FLAGS: 2025-05-07T19:51:24.4939836Z 2025-05-07T19:51:24.4939932Z 2025-05-07T19:51:24.4940112Z NVCC_FLAGS: 2025-05-07T19:51:24.4940228Z 2025-05-07T19:51:24.4940322Z 2025-05-07T19:51:24.4940505Z HIPCC_FLAGS: 2025-05-07T19:51:24.4940628Z 2025-05-07T19:51:24.4940723Z 2025-05-07T19:51:24.4940901Z INCLUDE_DIRS: 2025-05-07T19:51:24.4941129Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:51:24.4941432Z /__w/FBGEMM/FBGEMM/fbgemm_gpu 2025-05-07T19:51:24.4941720Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include 2025-05-07T19:51:24.4942021Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:51:24.4942516Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include 2025-05-07T19:51:24.4943317Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include 2025-05-07T19:51:24.4943956Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src 2025-05-07T19:51:24.4944367Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include 2025-05-07T19:51:24.4944788Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include 2025-05-07T19:51:24.4945261Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include 2025-05-07T19:51:24.4945767Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include 2025-05-07T19:51:24.4946228Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include 2025-05-07T19:51:24.4946785Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include 2025-05-07T19:51:24.4947451Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize 2025-05-07T19:51:24.4947818Z 2025-05-07T19:51:24.4948010Z Selected Source Files: 2025-05-07T19:51:24.4948506Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/attention.cpp 2025-05-07T19:51:24.4949150Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cpp 2025-05-07T19:51:24.4949702Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cpp 2025-05-07T19:51:24.4950216Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cpp 2025-05-07T19:51:24.4950761Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cpp 2025-05-07T19:51:24.4951360Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cpp 2025-05-07T19:51:24.4951895Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cpp 2025-05-07T19:51:24.4952463Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/gqa_attn_splitk.cu 2025-05-07T19:51:24.4953023Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cu 2025-05-07T19:51:24.4953553Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu 2025-05-07T19:51:24.4954052Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cu 2025-05-07T19:51:24.4954583Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu 2025-05-07T19:51:24.4955171Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cu 2025-05-07T19:51:24.4955714Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cu 2025-05-07T19:51:24.4956381Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu 2025-05-07T19:51:24.4957129Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16.cu 2025-05-07T19:51:24.4957912Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu 2025-05-07T19:51:24.4958775Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu 2025-05-07T19:51:24.4959536Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16.cu 2025-05-07T19:51:24.4960338Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T19:51:24.4961219Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T19:51:24.4962113Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T19:51:24.4963003Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T19:51:24.4963871Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T19:51:24.4964777Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T19:51:24.4965668Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T19:51:24.4966547Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T19:51:24.4967439Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T19:51:24.4968314Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T19:51:24.4969200Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T19:51:24.4970093Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T19:51:24.4971021Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T19:51:24.4971915Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T19:51:24.4972838Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T19:51:24.4973733Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T19:51:24.4975154Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T19:51:24.4976459Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T19:51:24.4977412Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T19:51:24.4978356Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T19:51:24.4979307Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T19:51:24.4980261Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T19:51:24.4981198Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T19:51:24.4982145Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T19:51:24.4982987Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16.cu 2025-05-07T19:51:24.4983760Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu 2025-05-07T19:51:24.4984584Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu 2025-05-07T19:51:24.4985366Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_lite.cu 2025-05-07T19:51:24.4986172Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu 2025-05-07T19:51:24.4987153Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu 2025-05-07T19:51:24.4988394Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu 2025-05-07T19:51:24.4989453Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu 2025-05-07T19:51:24.4990496Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu 2025-05-07T19:51:24.4991559Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu 2025-05-07T19:51:24.4992617Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu 2025-05-07T19:51:24.4993659Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu 2025-05-07T19:51:24.4994708Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu 2025-05-07T19:51:24.4995741Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu 2025-05-07T19:51:24.4996954Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu 2025-05-07T19:51:24.4998367Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu 2025-05-07T19:51:24.4999551Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu 2025-05-07T19:51:24.5000614Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu 2025-05-07T19:51:24.5001655Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu 2025-05-07T19:51:24.5002569Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu 2025-05-07T19:51:24.5003384Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu 2025-05-07T19:51:24.5004148Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu 2025-05-07T19:51:24.5004918Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu 2025-05-07T19:51:24.5005724Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu 2025-05-07T19:51:24.5006477Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16.cu 2025-05-07T19:51:24.5007198Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu 2025-05-07T19:51:24.5007946Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/mixed_dtype_utils.cu 2025-05-07T19:51:24.5008660Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16_fast_gemv.cu 2025-05-07T19:51:24.5009352Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu 2025-05-07T19:51:24.5010057Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu 2025-05-07T19:51:24.5010755Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv.cu 2025-05-07T19:51:24.5011431Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv.cuh 2025-05-07T19:51:24.5012116Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/utility.cuh 2025-05-07T19:51:24.5012589Z 2025-05-07T19:51:24.5012769Z HIPified Source Files: 2025-05-07T19:51:24.5012915Z 2025-05-07T19:51:24.5013003Z 2025-05-07T19:51:24.5013185Z Library Dependencies: 2025-05-07T19:51:24.5013411Z torch 2025-05-07T19:51:24.5013589Z torch_library 2025-05-07T19:51:24.5014004Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10.so 2025-05-07T19:51:24.5014805Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so 2025-05-07T19:51:24.5015551Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10_cuda.so 2025-05-07T19:51:24.5016527Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2 2025-05-07T19:51:24.5017260Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:51:24.5017876Z /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:51:24.5018268Z 2025-05-07T19:51:24.5018470Z Output Library: 2025-05-07T19:51:24.5018709Z fbgemm_gpu_experimental_gen_ai 2025-05-07T19:51:24.5018975Z 2025-05-07T19:51:24.5019168Z Destination Directory: 2025-05-07T19:51:24.5019331Z 2025-05-07T19:51:24.5019447Z ================================================================================ 2025-05-07T19:51:24.5019677Z 2025-05-07T19:51:24.5019681Z 2025-05-07T19:51:24.5019685Z 2025-05-07T19:51:24.5019809Z ================================================================================ 2025-05-07T19:51:24.5020172Z Adding to Package: fbgemm_gpu/experimental/gen_ai 2025-05-07T19:51:24.5021259Z 2025-05-07T19:51:24.5021442Z TARGETS: 2025-05-07T19:51:24.5021672Z fbgemm_gpu_experimental_gen_ai 2025-05-07T19:51:24.5021927Z 2025-05-07T19:51:24.5022127Z FILES: 2025-05-07T19:51:24.5022235Z 2025-05-07T19:51:24.5022435Z ================================================================================ 2025-05-07T19:51:24.5022662Z 2025-05-07T19:51:24.5022715Z 2025-05-07T19:51:24.5022718Z 2025-05-07T19:51:24.5022846Z ================================================================================ 2025-05-07T19:51:24.5023272Z GPU CPP Library Target: fbgemm_gpu_experimental_example_py (SHARED) 2025-05-07T19:51:24.5023670Z 2025-05-07T19:51:24.5023853Z CPU_SRCS: 2025-05-07T19:51:24.5023985Z 2025-05-07T19:51:24.5024065Z 2025-05-07T19:51:24.5024247Z GPU_SRCS: 2025-05-07T19:51:24.5024596Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_nccl.cpp 2025-05-07T19:51:24.5025146Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_ops.cpp 2025-05-07T19:51:24.5025729Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/cutlass_sgemm_nn.cu 2025-05-07T19:51:24.5026168Z 2025-05-07T19:51:24.5026356Z CUDA_SPECIFIC_SRCS: 2025-05-07T19:51:24.5026497Z 2025-05-07T19:51:24.5026589Z 2025-05-07T19:51:24.5026779Z HIP_SPECIFIC_SRCS: 2025-05-07T19:51:24.5026939Z 2025-05-07T19:51:24.5027017Z 2025-05-07T19:51:24.5027206Z OTHER_SRCS: 2025-05-07T19:51:24.5027345Z 2025-05-07T19:51:24.5027425Z 2025-05-07T19:51:24.5027605Z CC_FLAGS: 2025-05-07T19:51:24.5037359Z 2025-05-07T19:51:24.5037498Z 2025-05-07T19:51:24.5037710Z NVCC_FLAGS: 2025-05-07T19:51:24.5037827Z 2025-05-07T19:51:24.5038083Z 2025-05-07T19:51:24.5038256Z HIPCC_FLAGS: 2025-05-07T19:51:24.5038413Z 2025-05-07T19:51:24.5038498Z 2025-05-07T19:51:24.5038673Z INCLUDE_DIRS: 2025-05-07T19:51:24.5038911Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:51:24.5039219Z /__w/FBGEMM/FBGEMM/fbgemm_gpu 2025-05-07T19:51:24.5039504Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include 2025-05-07T19:51:24.5039814Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:51:24.5040308Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include 2025-05-07T19:51:24.5041100Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include 2025-05-07T19:51:24.5041738Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src 2025-05-07T19:51:24.5042156Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include 2025-05-07T19:51:24.5042572Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include 2025-05-07T19:51:24.5043045Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include 2025-05-07T19:51:24.5043655Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include 2025-05-07T19:51:24.5044097Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include 2025-05-07T19:51:24.5044639Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include 2025-05-07T19:51:24.5045118Z 2025-05-07T19:51:24.5045314Z Selected Source Files: 2025-05-07T19:51:24.5045672Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_nccl.cpp 2025-05-07T19:51:24.5046209Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_ops.cpp 2025-05-07T19:51:24.5046739Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/cutlass_sgemm_nn.cu 2025-05-07T19:51:24.5047156Z 2025-05-07T19:51:24.5047338Z HIPified Source Files: 2025-05-07T19:51:24.5047494Z 2025-05-07T19:51:24.5047565Z 2025-05-07T19:51:24.5047748Z Library Dependencies: 2025-05-07T19:51:24.5047968Z torch 2025-05-07T19:51:24.5048157Z torch_library 2025-05-07T19:51:24.5048573Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10.so 2025-05-07T19:51:24.5049232Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so 2025-05-07T19:51:24.5049898Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10_cuda.so 2025-05-07T19:51:24.5050774Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2 2025-05-07T19:51:24.5051572Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:51:24.5052120Z /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:51:24.5052487Z 2025-05-07T19:51:24.5052709Z Output Library: 2025-05-07T19:51:24.5052933Z fbgemm_gpu_experimental_example_py 2025-05-07T19:51:24.5053170Z 2025-05-07T19:51:24.5053345Z Destination Directory: 2025-05-07T19:51:24.5053483Z 2025-05-07T19:51:24.5053585Z ================================================================================ 2025-05-07T19:51:24.5053800Z 2025-05-07T19:51:24.5053804Z 2025-05-07T19:51:24.5053807Z 2025-05-07T19:51:24.5053909Z ================================================================================ 2025-05-07T19:51:24.5054252Z Adding to Package: fbgemm_gpu/experimental/example 2025-05-07T19:51:24.5054545Z 2025-05-07T19:51:24.5054712Z TARGETS: 2025-05-07T19:51:24.5054906Z fbgemm_gpu_experimental_example_py 2025-05-07T19:51:24.5055163Z 2025-05-07T19:51:24.5055319Z FILES: 2025-05-07T19:51:24.5055730Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/example/__init__.py 2025-05-07T19:51:24.5056500Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/example/utils.py 2025-05-07T19:51:24.5056933Z ================================================================================ 2025-05-07T19:51:24.5057155Z 2025-05-07T19:51:24.5057159Z 2025-05-07T19:51:24.5057162Z 2025-05-07T19:51:24.5057282Z ================================================================================ 2025-05-07T19:51:24.5057672Z Adding to Package: fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T19:51:24.5058027Z 2025-05-07T19:51:24.5058197Z TARGETS: 2025-05-07T19:51:24.5058314Z 2025-05-07T19:51:24.5058390Z 2025-05-07T19:51:24.5058552Z FILES: 2025-05-07T19:51:24.5058881Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gemm/triton_gemm/__init__.py 2025-05-07T19:51:24.5059429Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gemm/triton_gemm/fp8_gemm.py 2025-05-07T19:51:24.5059990Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gemm/triton_gemm/grouped_gemm.py 2025-05-07T19:51:24.5060598Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gemm/triton_gemm/matmul_perf_model.py 2025-05-07T19:51:24.5061159Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gemm/triton_gemm/utils.py 2025-05-07T19:51:24.5061589Z ================================================================================ 2025-05-07T19:51:24.5061810Z 2025-05-07T19:51:24.5061910Z -- Configuring done (8.6s) 2025-05-07T19:51:24.5062182Z -- Generating done (0.0s) 2025-05-07T19:51:24.5062691Z -- Build files have been written to: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build 2025-05-07T19:51:24.5182175Z Change Dir: '/__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build' 2025-05-07T19:51:24.5182871Z 2025-05-07T19:51:24.5183212Z Run Build Command(s): /github/home/miniconda/envs/build_binary/bin/ninja -v -j 48 install 2025-05-07T19:51:24.6320667Z [1/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/errorhandler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/errorhandler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/errorhandler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/errorhandler.cpp 2025-05-07T19:51:24.6332880Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.6532141Z [2/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/support.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/support.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/support.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/support.cpp 2025-05-07T19:51:24.6543866Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.6579038Z [3/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/globals.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/globals.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/globals.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/globals.cpp 2025-05-07T19:51:24.6590451Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.6611683Z [4/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonelist.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonelist.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonelist.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonelist.cpp 2025-05-07T19:51:24.6623504Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.6760056Z [5/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonestack.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonestack.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonestack.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonestack.cpp 2025-05-07T19:51:24.6771453Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.6782962Z [6/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/osutils.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/osutils.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/osutils.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/osutils.cpp 2025-05-07T19:51:24.6794233Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.6859843Z [7/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/operand.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/operand.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/operand.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/operand.cpp 2025-05-07T19:51:24.6871462Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.6882749Z [8/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonehash.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonehash.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonehash.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonehash.cpp 2025-05-07T19:51:24.6893852Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.6905201Z [9/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonetree.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonetree.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonetree.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonetree.cpp 2025-05-07T19:51:24.6916447Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.7019282Z [10/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/environment.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/environment.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/environment.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/environment.cpp 2025-05-07T19:51:24.7033355Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.7172314Z [11/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zone.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zone.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zone.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zone.cpp 2025-05-07T19:51:24.7184191Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.7354155Z [12/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/string.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/string.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/string.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/string.cpp 2025-05-07T19:51:24.7365472Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.7484084Z [13/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/constpool.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/constpool.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/constpool.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/constpool.cpp 2025-05-07T19:51:24.7495248Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.7563389Z [14/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/inst.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/inst.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/inst.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/inst.cpp 2025-05-07T19:51:24.7575000Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.7586153Z [15/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/target.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/target.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/target.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/target.cpp 2025-05-07T19:51:24.7597199Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.7644687Z [16/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/type.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/type.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/type.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/type.cpp 2025-05-07T19:51:24.7656420Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.7691677Z [17/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/logger.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/logger.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/logger.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/logger.cpp 2025-05-07T19:51:24.7703642Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.7767955Z [18/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/instdb.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/instdb.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/instdb.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/instdb.cpp 2025-05-07T19:51:24.7779681Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.8147375Z [19/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitruntime.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitruntime.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitruntime.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitruntime.cpp 2025-05-07T19:51:24.8159146Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.8203493Z [20/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonevector.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonevector.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonevector.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonevector.cpp 2025-05-07T19:51:24.8215603Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.8513888Z [21/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rastack.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rastack.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rastack.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rastack.cpp 2025-05-07T19:51:24.8525202Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.8541441Z [22/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitterutils.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitterutils.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitterutils.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitterutils.cpp 2025-05-07T19:51:24.8552759Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.8618350Z [23/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codewriter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codewriter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codewriter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codewriter.cpp 2025-05-07T19:51:24.8629484Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.8640480Z [24/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64func.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64func.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64func.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64func.cpp 2025-05-07T19:51:24.8652484Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.8669923Z [25/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/func.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/func.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/func.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/func.cpp 2025-05-07T19:51:24.8681406Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.8692700Z [26/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64operand.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64operand.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64operand.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64operand.cpp 2025-05-07T19:51:24.8704373Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.8824655Z [27/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitter.cpp 2025-05-07T19:51:24.8836203Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.8973805Z [28/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/funcargscontext.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/funcargscontext.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/funcargscontext.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/funcargscontext.cpp 2025-05-07T19:51:24.8985845Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.8996903Z [29/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instapi.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instapi.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instapi.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instapi.cpp 2025-05-07T19:51:24.9008369Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.9140569Z [30/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instdb.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instdb.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instdb.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instdb.cpp 2025-05-07T19:51:24.9152375Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.9164908Z [31/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64formatter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64formatter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64formatter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64formatter.cpp 2025-05-07T19:51:24.9176643Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.9219965Z [32/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/assembler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/assembler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/assembler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/assembler.cpp 2025-05-07T19:51:24.9310670Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.9322564Z [33/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emithelper.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emithelper.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emithelper.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emithelper.cpp 2025-05-07T19:51:24.9337900Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.9391960Z [34/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64builder.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64builder.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64builder.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64builder.cpp 2025-05-07T19:51:24.9403459Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.9557059Z [35/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/archtraits.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/archtraits.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/archtraits.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/archtraits.cpp 2025-05-07T19:51:24.9568953Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.9704484Z [36/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/armformatter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/armformatter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/armformatter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/armformatter.cpp 2025-05-07T19:51:24.9716514Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:24.9868511Z [37/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitallocator.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitallocator.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitallocator.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitallocator.cpp 2025-05-07T19:51:24.9880356Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:25.0030445Z [38/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/formatter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/formatter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/formatter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/formatter.cpp 2025-05-07T19:51:25.0041971Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:25.0179582Z [39/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64compiler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64compiler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64compiler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64compiler.cpp 2025-05-07T19:51:25.0192196Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:25.0436598Z [40/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64emithelper.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64emithelper.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64emithelper.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64emithelper.cpp 2025-05-07T19:51:25.0449095Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:25.0583529Z [41/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/compiler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/compiler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/compiler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/compiler.cpp 2025-05-07T19:51:25.0596013Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:25.0607642Z [42/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/virtmem.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/virtmem.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/virtmem.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/virtmem.cpp 2025-05-07T19:51:25.0619788Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:25.0631423Z [43/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86operand.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86operand.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86operand.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86operand.cpp 2025-05-07T19:51:25.0643400Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:25.0912310Z [44/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/builder.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/builder.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/builder.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/builder.cpp 2025-05-07T19:51:25.0925121Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:25.1560722Z [45/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/cpuinfo.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/cpuinfo.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/cpuinfo.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/cpuinfo.cpp 2025-05-07T19:51:25.1572949Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:25.2221805Z [46/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86formatter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86formatter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86formatter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86formatter.cpp 2025-05-07T19:51:25.2234318Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:25.2586726Z [47/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86func.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86func.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86func.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86func.cpp 2025-05-07T19:51:25.2593120Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:25.2764223Z [48/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86compiler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86compiler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86compiler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86compiler.cpp 2025-05-07T19:51:25.2775679Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:25.2797285Z [49/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/ralocal.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/ralocal.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/ralocal.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/ralocal.cpp 2025-05-07T19:51:25.2808303Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:25.3219574Z [50/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86emithelper.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86emithelper.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86emithelper.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86emithelper.cpp 2025-05-07T19:51:25.3232372Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:25.4002339Z [51/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instapi.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instapi.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instapi.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instapi.cpp 2025-05-07T19:51:25.4014508Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:25.4223306Z [52/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64rapass.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64rapass.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64rapass.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64rapass.cpp 2025-05-07T19:51:25.4235752Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:25.5020463Z [53/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86builder.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86builder.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86builder.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86builder.cpp 2025-05-07T19:51:25.5033210Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:25.5592677Z [54/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codeholder.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codeholder.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codeholder.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codeholder.cpp 2025-05-07T19:51:25.5604834Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:25.6546947Z [55/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rapass.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rapass.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rapass.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rapass.cpp 2025-05-07T19:51:25.6559258Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:25.6948538Z [56/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instdb.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instdb.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instdb.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instdb.cpp 2025-05-07T19:51:25.6960924Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:25.7390359Z [57/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -mavx512f -mavx512bw -mavx512dq -mavx512vl -fopenmp -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx512.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx512.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx512.cc.o -c /__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx512.cc 2025-05-07T19:51:25.7400119Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:25.7473046Z [58/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx2.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx2.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx2.cc.o -c /__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx2.cc 2025-05-07T19:51:25.7490648Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:25.9022618Z [59/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86rapass.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86rapass.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86rapass.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86rapass.cpp 2025-05-07T19:51:25.9034667Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:25.9932632Z [60/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64assembler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64assembler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64assembler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64assembler.cpp 2025-05-07T19:51:25.9945719Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:26.0840606Z [61/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtils.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtils.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtils.cc.o -c /__w/FBGEMM/FBGEMM/src/QuantUtils.cc 2025-05-07T19:51:26.0858785Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:26.3033387Z [62/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86assembler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86assembler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86assembler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86assembler.cpp 2025-05-07T19:51:26.3045827Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:26.9280923Z [63/156] : && /github/home/miniconda/envs/build_binary/bin/c++ -fPIC -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/github/home/miniconda/envs/build_binary/lib -Wl,-rpath-link,/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs -s -shared -Wl,-soname,asmjit.so -o asmjit.so CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64assembler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64builder.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64compiler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64emithelper.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64formatter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64func.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instapi.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instdb.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64operand.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64rapass.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/armformatter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/archtraits.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/assembler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/builder.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codeholder.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codewriter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/compiler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/constpool.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/cpuinfo.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emithelper.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitterutils.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/environment.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/errorhandler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/formatter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/func.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/funcargscontext.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/globals.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/inst.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/instdb.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitallocator.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitruntime.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/logger.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/operand.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/osutils.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/ralocal.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rapass.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rastack.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/string.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/support.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/target.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/type.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/virtmem.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zone.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonehash.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonelist.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonestack.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonetree.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonevector.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86assembler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86builder.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86compiler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86emithelper.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86formatter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86func.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instapi.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instdb.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86operand.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86rapass.cpp.o -L/lib/intel64 -L/lib/intel64_win -L/lib/win-x64 -Wl,-rpath,/lib/intel64:/lib/intel64_win:/lib/win-x64:/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib:/github/home/miniconda/envs/build_binary/lib/stubs: /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2 /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch_cpu.so" -Wl,--as-needed -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch_cuda.so" -Wl,--as-needed /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libcudart.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch.so" -Wl,--as-needed && : 2025-05-07T19:51:26.9353176Z [64/156] cd /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build && bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/../.github/scripts/fbgemm_gpu_postbuild.bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build/asmjit.so 2025-05-07T19:51:26.9354384Z ################################################################################ 2025-05-07T19:51:26.9354787Z [CMAKE] Running post-build script ... 2025-05-07T19:51:26.9355330Z Target file: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build/asmjit.so 2025-05-07T19:51:26.9355840Z Removing all RPATHs ... 2025-05-07T19:51:26.9356152Z ################################################################################ 2025-05-07T19:51:27.1759740Z [65/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/Utils.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/Utils.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/Utils.cc.o -c /__w/FBGEMM/FBGEMM/src/Utils.cc 2025-05-07T19:51:27.5992988Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:27.6008560Z [66/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/SparseAdagrad.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/SparseAdagrad.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/SparseAdagrad.cc.o -c /__w/FBGEMM/FBGEMM/src/SparseAdagrad.cc 2025-05-07T19:51:27.6024929Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:30.8878238Z [67/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RefImplementations.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RefImplementations.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RefImplementations.cc.o -c /__w/FBGEMM/FBGEMM/src/RefImplementations.cc 2025-05-07T19:51:30.8895146Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:31.4497986Z [68/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RowWiseSparseAdagradFused.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RowWiseSparseAdagradFused.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RowWiseSparseAdagradFused.cc.o -c /__w/FBGEMM/FBGEMM/src/RowWiseSparseAdagradFused.cc 2025-05-07T19:51:31.4514527Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:32.9937536Z [69/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAutovec.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAutovec.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAutovec.cc.o -c /__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAutovec.cc 2025-05-07T19:51:32.9953446Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:33.4784372Z [70/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cpp 2025-05-07T19:51:33.4803560Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:33.5366253Z [71/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/attention.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/attention.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/attention.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/attention.cpp 2025-05-07T19:51:33.5384699Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:33.6247421Z [72/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cpp 2025-05-07T19:51:33.6265385Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:33.8567965Z [73/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cpp 2025-05-07T19:51:33.8587740Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:35.2829515Z [74/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cpp 2025-05-07T19:51:35.2848598Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:35.9143483Z [75/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cpp 2025-05-07T19:51:35.9163011Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:36.3799824Z [76/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtilsAvx2.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtilsAvx2.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtilsAvx2.cc.o -c /__w/FBGEMM/FBGEMM/src/QuantUtilsAvx2.cc 2025-05-07T19:51:36.3817527Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:36.6483646Z [77/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cpp 2025-05-07T19:51:36.6501465Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:51:50.3418398Z [78/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMNBit.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMNBit.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMNBit.cc.o -c /__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMNBit.cc 2025-05-07T19:51:50.3435173Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:55:39.7204698Z [79/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16.cu.o 2025-05-07T19:55:39.7333834Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:18.3913055Z [80/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cu.o 2025-05-07T19:57:18.3933904Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:18.4239579Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:18.4241894Z static auto dtype() { 2025-05-07T19:57:18.4242321Z ^ 2025-05-07T19:57:18.4242533Z 2025-05-07T19:57:18.4242951Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:18.4243514Z 2025-05-07T19:57:18.4244773Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:18.4246366Z static auto dtype() { 2025-05-07T19:57:18.4246750Z ^ 2025-05-07T19:57:18.4247110Z 2025-05-07T19:57:18.4248425Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:18.4250146Z static auto dtype() { 2025-05-07T19:57:18.4250556Z ^ 2025-05-07T19:57:18.4250762Z 2025-05-07T19:57:18.4252044Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:18.4253659Z static auto dtype() { 2025-05-07T19:57:18.4254093Z ^ 2025-05-07T19:57:18.4254311Z 2025-05-07T19:57:18.4254726Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:18.4255434Z 2025-05-07T19:57:18.4256721Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:18.4258385Z static auto dtype() { 2025-05-07T19:57:18.4258780Z ^ 2025-05-07T19:57:18.4259028Z 2025-05-07T19:57:18.4260397Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:18.4262114Z static auto dtype() { 2025-05-07T19:57:18.4262525Z ^ 2025-05-07T19:57:18.4262734Z 2025-05-07T19:57:18.4264034Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:18.4265645Z static auto dtype() { 2025-05-07T19:57:18.4266077Z ^ 2025-05-07T19:57:18.4266290Z 2025-05-07T19:57:18.4266707Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:18.4267282Z 2025-05-07T19:57:18.4268599Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:18.4270247Z static auto dtype() { 2025-05-07T19:57:18.4270636Z ^ 2025-05-07T19:57:18.4270875Z 2025-05-07T19:57:18.4272253Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:18.4282686Z static auto dtype() { 2025-05-07T19:57:18.4283242Z ^ 2025-05-07T19:57:18.4283516Z 2025-05-07T19:57:18.4284842Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:18.4286764Z static auto dtype() { 2025-05-07T19:57:18.4287203Z ^ 2025-05-07T19:57:18.4368320Z 2025-05-07T19:57:18.4419224Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:18.4420943Z 2025-05-07T19:57:18.4422221Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:18.4423704Z static auto dtype() { 2025-05-07T19:57:18.4424071Z ^ 2025-05-07T19:57:18.4424379Z 2025-05-07T19:57:18.4425821Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:18.4453486Z static auto dtype() { 2025-05-07T19:57:18.4454341Z ^ 2025-05-07T19:57:18.4454569Z 2025-05-07T19:57:18.4456030Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:18.4457626Z static auto dtype() { 2025-05-07T19:57:18.4458052Z ^ 2025-05-07T19:57:18.4458266Z 2025-05-07T19:57:18.4458749Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:18.4459303Z 2025-05-07T19:57:18.4460536Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:18.4462112Z static auto dtype() { 2025-05-07T19:57:18.4462512Z ^ 2025-05-07T19:57:18.4462756Z 2025-05-07T19:57:18.4464126Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:18.4523470Z static auto dtype() { 2025-05-07T19:57:18.4524362Z ^ 2025-05-07T19:57:18.4524549Z 2025-05-07T19:57:18.4525745Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:18.4527043Z static auto dtype() { 2025-05-07T19:57:18.4527417Z ^ 2025-05-07T19:57:18.4527599Z 2025-05-07T19:57:18.4528042Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:18.4528523Z 2025-05-07T19:57:18.4529566Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:18.4574220Z static auto dtype() { 2025-05-07T19:57:18.4575403Z ^ 2025-05-07T19:57:18.4575662Z 2025-05-07T19:57:18.4577106Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:18.4578771Z static auto dtype() { 2025-05-07T19:57:18.4579159Z ^ 2025-05-07T19:57:18.4579370Z 2025-05-07T19:57:18.4597508Z [81/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cu.o 2025-05-07T19:57:18.4616820Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:20.1423730Z [82/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu.o 2025-05-07T19:57:20.1436287Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:21.2016748Z [83/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu.o 2025-05-07T19:57:21.2038198Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:21.6874559Z [84/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_lite.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_lite.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_lite.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_lite.cu.o 2025-05-07T19:57:21.6897056Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:24.0997794Z [85/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDM.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDM.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDM.cc.o -c /__w/FBGEMM/FBGEMM/src/EmbeddingSpMDM.cc 2025-05-07T19:57:24.1014935Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:57:27.8622283Z [86/156] : && /github/home/miniconda/envs/build_binary/bin/c++ -fPIC -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/github/home/miniconda/envs/build_binary/lib -Wl,-rpath-link,/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs -s -shared -Wl,-soname,fbgemm.so -o fbgemm.so CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDM.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAutovec.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMNBit.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtils.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RefImplementations.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RowWiseSparseAdagradFused.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/SparseAdagrad.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/Utils.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx2.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtilsAvx2.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx512.cc.o -L/lib/intel64 -L/lib/intel64_win -L/lib/win-x64 -Wl,-rpath,"\$ORIGIN" /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2 /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so asmjit.so /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch.so" -Wl,--as-needed /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch_cpu.so" -Wl,--as-needed -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch_cuda.so" -Wl,--as-needed /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libcudart.so && : 2025-05-07T19:57:28.2920511Z [87/156] cd /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build && bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/../.github/scripts/fbgemm_gpu_postbuild.bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build/fbgemm.so 1 2025-05-07T19:57:28.2922839Z ################################################################################ 2025-05-07T19:57:28.2923466Z [CMAKE] Running post-build script ... 2025-05-07T19:57:28.2924974Z Target file: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build/fbgemm.so 2025-05-07T19:57:28.2925929Z Resetting RPATH to $ORIGIN ... 2025-05-07T19:57:28.2926518Z 0x000000000000000f (RPATH) Library rpath: [$ORIGIN] 2025-05-07T19:57:28.2927525Z ################################################################################ 2025-05-07T19:57:37.8903604Z [88/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/gqa_attn_splitk.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/gqa_attn_splitk.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/gqa_attn_splitk.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/gqa_attn_splitk.cu.o 2025-05-07T19:57:37.8924786Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:45.9895904Z [89/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cu.o 2025-05-07T19:57:45.9912491Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:13.4800705Z [90/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16.cu.o 2025-05-07T19:58:13.4822665Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:13.4825426Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:13.4827524Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:13.4828342Z ^ 2025-05-07T19:58:13.4828685Z 2025-05-07T19:58:13.4829146Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:13.4829647Z 2025-05-07T19:58:13.4831174Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:13.4833388Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:13.4834231Z ^ 2025-05-07T19:58:13.4834586Z 2025-05-07T19:58:14.8685717Z [91/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu.o 2025-05-07T19:58:14.8708772Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:14.8711571Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:14.8713760Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:14.8714614Z ^ 2025-05-07T19:58:14.8714984Z 2025-05-07T19:58:14.8715442Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:14.8716119Z 2025-05-07T19:58:14.8717570Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:14.8719644Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:14.8720520Z ^ 2025-05-07T19:58:14.8720841Z 2025-05-07T19:58:17.4672979Z [92/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu.o 2025-05-07T19:58:17.4702086Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:17.4705167Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:17.4921386Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:17.4922280Z ^ 2025-05-07T19:58:17.4922637Z 2025-05-07T19:58:17.4923090Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:17.4923716Z 2025-05-07T19:58:17.4925281Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:17.4927395Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:17.4928218Z ^ 2025-05-07T19:58:17.4928528Z 2025-05-07T19:58:26.7005051Z [93/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu.o 2025-05-07T19:58:26.7031319Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:26.7034781Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:26.7037046Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:26.7037994Z ^ 2025-05-07T19:58:26.7038372Z 2025-05-07T19:58:26.7038856Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:26.7039548Z 2025-05-07T19:58:26.7041165Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:26.7043434Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:26.7044336Z ^ 2025-05-07T19:58:26.7044680Z 2025-05-07T19:58:27.4051711Z [94/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cu.o 2025-05-07T19:58:27.4071861Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:39.2563052Z [95/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cu.o 2025-05-07T19:58:39.2584079Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:39.2586968Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __host__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:58:39.2589403Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:58:39.2590526Z ^ 2025-05-07T19:58:39.2590791Z 2025-05-07T19:58:39.2591205Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:39.2591806Z 2025-05-07T19:58:39.2593402Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __device__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:58:39.2595844Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:58:39.2596970Z ^ 2025-05-07T19:58:39.2597302Z 2025-05-07T19:58:39.2598801Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __host__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:58:39.2601565Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:58:39.2602668Z ^ 2025-05-07T19:58:39.2602917Z 2025-05-07T19:58:39.2603497Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:39.2604151Z 2025-05-07T19:58:39.2605676Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __device__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:58:39.2608137Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:58:39.2609164Z ^ 2025-05-07T19:58:39.2609531Z 2025-05-07T19:58:39.2610649Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(147): warning #177-D: variable "fbgemm_gpu::CVT_FP4_ELTS_PER_THREAD" was declared but never referenced 2025-05-07T19:58:39.2612170Z constexpr int CVT_FP4_ELTS_PER_THREAD = 8; 2025-05-07T19:58:39.2612691Z ^ 2025-05-07T19:58:39.2612912Z 2025-05-07T19:58:39.2614026Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(148): warning #177-D: variable "fbgemm_gpu::CVT_FP4_SF_VEC_SIZE" was declared but never referenced 2025-05-07T19:58:39.2615656Z constexpr int CVT_FP4_SF_VEC_SIZE = 16; 2025-05-07T19:58:39.2616214Z ^ 2025-05-07T19:58:39.2616449Z 2025-05-07T19:58:39.2617991Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __host__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:58:39.2620411Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:58:39.2621539Z ^ 2025-05-07T19:58:39.2621770Z 2025-05-07T19:58:39.2622178Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:39.2622807Z 2025-05-07T19:58:39.2624296Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __device__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:58:39.2626741Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:58:39.2627860Z ^ 2025-05-07T19:58:39.2628252Z 2025-05-07T19:58:39.2629420Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(147): warning #177-D: variable "fbgemm_gpu::CVT_FP4_ELTS_PER_THREAD" was declared but never referenced 2025-05-07T19:58:39.2631021Z constexpr int CVT_FP4_ELTS_PER_THREAD = 8; 2025-05-07T19:58:39.2631580Z ^ 2025-05-07T19:58:39.2631813Z 2025-05-07T19:58:39.2632961Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(148): warning #177-D: variable "fbgemm_gpu::CVT_FP4_SF_VEC_SIZE" was declared but never referenced 2025-05-07T19:58:39.2634504Z constexpr int CVT_FP4_SF_VEC_SIZE = 16; 2025-05-07T19:58:39.2635049Z ^ 2025-05-07T19:58:39.2635280Z 2025-05-07T19:58:39.2636795Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __host__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:58:39.2639298Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:58:39.2640456Z ^ 2025-05-07T19:58:39.2640719Z 2025-05-07T19:58:39.2641128Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:39.2642014Z 2025-05-07T19:58:39.2643684Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __device__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:58:39.2646181Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:58:39.2647151Z ^ 2025-05-07T19:58:39.2647535Z 2025-05-07T19:58:39.2648568Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(147): warning #177-D: variable "fbgemm_gpu::CVT_FP4_ELTS_PER_THREAD" was declared but never referenced 2025-05-07T19:58:39.2650074Z constexpr int CVT_FP4_ELTS_PER_THREAD = 8; 2025-05-07T19:58:39.2650538Z ^ 2025-05-07T19:58:39.2650742Z 2025-05-07T19:58:39.2651872Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(148): warning #177-D: variable "fbgemm_gpu::CVT_FP4_SF_VEC_SIZE" was declared but never referenced 2025-05-07T19:58:39.2653257Z constexpr int CVT_FP4_SF_VEC_SIZE = 16; 2025-05-07T19:58:39.2653792Z ^ 2025-05-07T19:58:39.2654020Z 2025-05-07T19:58:39.2655671Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __host__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:58:39.2658149Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:58:39.2659260Z ^ 2025-05-07T19:58:39.2659506Z 2025-05-07T19:58:39.2659919Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:39.2660562Z 2025-05-07T19:58:39.2662078Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __device__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:58:39.2664565Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:58:39.2665649Z ^ 2025-05-07T19:58:39.2666026Z 2025-05-07T19:58:39.2667131Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(147): warning #177-D: variable "fbgemm_gpu::CVT_FP4_ELTS_PER_THREAD" was declared but never referenced 2025-05-07T19:58:39.2668701Z constexpr int CVT_FP4_ELTS_PER_THREAD = 8; 2025-05-07T19:58:39.2669213Z ^ 2025-05-07T19:58:39.2669437Z 2025-05-07T19:58:39.2670544Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(148): warning #177-D: variable "fbgemm_gpu::CVT_FP4_SF_VEC_SIZE" was declared but never referenced 2025-05-07T19:58:39.2672002Z constexpr int CVT_FP4_SF_VEC_SIZE = 16; 2025-05-07T19:58:39.2672517Z ^ 2025-05-07T19:58:39.2672723Z 2025-05-07T19:58:39.2674128Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __host__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:58:39.2676856Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:58:39.2677964Z ^ 2025-05-07T19:58:39.2678213Z 2025-05-07T19:58:39.2678632Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:39.2679265Z 2025-05-07T19:58:39.2680816Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __device__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:58:39.2683593Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:58:39.2684702Z ^ 2025-05-07T19:58:39.2685077Z 2025-05-07T19:58:39.2686737Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __host__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:58:39.2689226Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:58:39.2690308Z ^ 2025-05-07T19:58:39.2690549Z 2025-05-07T19:58:39.2690999Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:39.2691612Z 2025-05-07T19:58:39.2693135Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include/fbgemm_gpu/utils/stochastic_rounding.cuh(32): warning #20012-D: __device__ annotation is ignored on a function("StochasticRoundingRNGState") that is explicitly defaulted on its first declaration 2025-05-07T19:58:39.2695635Z __attribute__((host)) __attribute__((device)) inline __attribute__((always_inline)) constexpr StochasticRoundingRNGState() = default; 2025-05-07T19:58:39.2696692Z ^ 2025-05-07T19:58:39.2697031Z 2025-05-07T19:58:39.2697995Z ptxas warning : Value of threads per SM for entry _ZN10fbgemm_gpu15cvt_fp16_to_fp4I13__nv_bfloat16Lb0EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-05-07T19:58:39.2700129Z ptxas warning : Value of threads per SM for entry _ZN10fbgemm_gpu15cvt_fp16_to_fp4I13__nv_bfloat16Lb1EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-05-07T19:58:39.2702254Z ptxas warning : Value of threads per SM for entry _ZN10fbgemm_gpu15cvt_fp16_to_fp4I6__halfLb0EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-05-07T19:58:39.2704295Z ptxas warning : Value of threads per SM for entry _ZN10fbgemm_gpu15cvt_fp16_to_fp4I6__halfLb1EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-05-07T19:58:48.8705562Z [96/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu.o 2025-05-07T19:58:48.8732621Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:48.8735857Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:48.8738083Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:48.8738947Z ^ 2025-05-07T19:58:48.8739296Z 2025-05-07T19:58:48.8739735Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:48.8740417Z 2025-05-07T19:58:48.8741998Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:48.8744255Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:48.8745108Z ^ 2025-05-07T19:58:48.8745460Z 2025-05-07T19:58:54.4706170Z [97/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu.o 2025-05-07T19:58:54.4730052Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:54.4733040Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:54.4735106Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:54.4735918Z ^ 2025-05-07T19:58:54.4736229Z 2025-05-07T19:58:54.4736642Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:54.4737256Z 2025-05-07T19:58:54.4738750Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:54.4740768Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:54.4741582Z ^ 2025-05-07T19:58:54.4741874Z 2025-05-07T19:58:55.1054175Z [98/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cu.o 2025-05-07T19:58:55.1073918Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:55.1562818Z [99/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu.o 2025-05-07T19:58:55.1581711Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:55.1583788Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:55.1585663Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:55.1586519Z ^ 2025-05-07T19:58:55.1586748Z 2025-05-07T19:58:55.1587054Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:55.1587519Z 2025-05-07T19:58:55.1588618Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:55.1590119Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:55.1590746Z ^ 2025-05-07T19:58:55.1590966Z 2025-05-07T19:58:55.4550452Z [100/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu.o 2025-05-07T19:58:55.4569193Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:55.4571461Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:55.4573586Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:55.4574217Z ^ 2025-05-07T19:58:55.4574489Z 2025-05-07T19:58:55.4575265Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:55.4575870Z 2025-05-07T19:58:55.4576999Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:55.4578672Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:55.4579337Z ^ 2025-05-07T19:58:55.4579573Z 2025-05-07T19:58:57.2253737Z [101/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu.o 2025-05-07T19:58:57.2274564Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:57.2277270Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:57.2279193Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:57.2280372Z ^ 2025-05-07T19:58:57.2280682Z 2025-05-07T19:58:57.2281199Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:57.2281898Z 2025-05-07T19:58:57.2283650Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:57.2285418Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:57.2286079Z ^ 2025-05-07T19:58:57.2286372Z 2025-05-07T19:58:57.7874950Z [102/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu.o 2025-05-07T19:58:57.7898389Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:57.7901288Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:57.7903407Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:57.7904303Z ^ 2025-05-07T19:58:57.7904661Z 2025-05-07T19:58:57.7905110Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:57.7906153Z 2025-05-07T19:58:57.7907743Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:57.7910211Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:57.7911101Z ^ 2025-05-07T19:58:57.7911425Z 2025-05-07T19:58:58.3515190Z [103/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_example_py_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_nccl.cpp.o -MF experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_nccl.cpp.o.d -o experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_nccl.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_nccl.cpp 2025-05-07T19:58:58.3534213Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:59:00.0583319Z [104/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu.o 2025-05-07T19:59:00.0604926Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:00.0607639Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:00.0609801Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:00.0610702Z ^ 2025-05-07T19:59:00.0611006Z 2025-05-07T19:59:00.0611367Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:00.0611952Z 2025-05-07T19:59:00.0613385Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:00.0615485Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:00.0616443Z ^ 2025-05-07T19:59:00.0616837Z 2025-05-07T19:59:00.0629101Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_10multipliesES1N_fLNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SV_SV_EEELi4ELb1EEENS1P_INS1Q_IS1R_ffLS1S_2EvEEJNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEES11_NS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S25_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES29_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:00.0652501Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1G_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1N_NS1G_6fusion15Sm90TreeVisitorINS1P_11Sm90ComputeINS_10multipliesES1O_fLNS_15FloatRoundStyleE2EvEEJNS1P_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SW_SW_EEELi4ELb1EEENS1Q_INS1R_IS1S_ffLS1T_2EvEEJNS1P_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1P_12Sm90AccFetchEEEEEEES12_NS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S26_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2A_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:00.0676524Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_4plusES1N_fLNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SV_SV_EEELi4ELb1EEENS1P_INS1Q_INS_10multipliesEffLS1S_2EvEEJS1W_NS1P_IS1Y_JNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEEEEES11_NS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S27_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2B_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:00.0701837Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1G_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1N_NS1G_6fusion15Sm90TreeVisitorINS1P_11Sm90ComputeINS_4plusES1O_fLNS_15FloatRoundStyleE2EvEEJNS1P_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SW_SW_EEELi4ELb1EEENS1Q_INS1R_INS_10multipliesEffLS1T_2EvEEJS1X_NS1Q_IS1Z_JNS1P_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1P_12Sm90AccFetchEEEEEEEEEES12_NS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S28_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2C_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:00.0726605Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_4plusES1N_S1N_LNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_S1N_S1N_NS5_IJSC_SV_SV_EEELi8ELb1EEENS1P_INS1Q_INS_10multipliesES1N_fLS1S_2EvEEJNS1U_ILi0ESI_ffS1V_Li4ELb1EEENS1P_INS1Q_IS1X_ffLS1S_2EvEEJNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEEEEES11_NS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S29_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2D_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:00.0751822Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1G_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1N_NS1G_6fusion15Sm90TreeVisitorINS1P_11Sm90ComputeINS_4plusES1O_S1O_LNS_15FloatRoundStyleE2EvEEJNS1P_16Sm90ColBroadcastILi0ESI_S1O_S1O_NS5_IJSC_SW_SW_EEELi8ELb1EEENS1Q_INS1R_INS_10multipliesES1O_fLS1T_2EvEEJNS1V_ILi0ESI_ffS1W_Li4ELb1EEENS1Q_INS1R_IS1Y_ffLS1T_2EvEEJNS1P_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1P_12Sm90AccFetchEEEEEEEEEES12_NS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S2A_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2E_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:01.4736325Z [105/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu.o 2025-05-07T19:59:01.4759088Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:01.4761959Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:01.4764170Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:01.4764963Z ^ 2025-05-07T19:59:01.4765303Z 2025-05-07T19:59:01.4765690Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:01.4766322Z 2025-05-07T19:59:01.4767707Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:01.4769782Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:01.4770579Z ^ 2025-05-07T19:59:01.4771345Z 2025-05-07T19:59:07.2586377Z [106/156] /github/home/miniconda/envs/build_binary/bin/c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_example_py_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_ops.cpp.o -MF experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_ops.cpp.o.d -o experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_ops.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_ops.cpp 2025-05-07T19:59:07.2604835Z clang-16: warning: argument unused during compilation: '-L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib' [-Wunused-command-line-argument] 2025-05-07T19:59:14.0725050Z [107/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/include/fast_gemv.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/include/fast_gemv.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/include/fast_gemv.cu.o 2025-05-07T19:59:14.0853956Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:16.4926701Z [108/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu.o 2025-05-07T19:59:16.5002752Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:16.5006460Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:16.5008879Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:16.5009755Z ^ 2025-05-07T19:59:16.5010110Z 2025-05-07T19:59:16.5010631Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:16.5011318Z 2025-05-07T19:59:16.5012948Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:16.5015197Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:16.5016175Z ^ 2025-05-07T19:59:16.5016496Z 2025-05-07T19:59:31.8881427Z [109/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu.o 2025-05-07T19:59:31.8903817Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:31.8906720Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:31.8965201Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:31.8966879Z ^ 2025-05-07T19:59:31.8967238Z 2025-05-07T19:59:31.8967684Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:31.8968300Z 2025-05-07T19:59:31.9028291Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:31.9030907Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:31.9087850Z ^ 2025-05-07T19:59:31.9088768Z 2025-05-07T19:59:31.9146720Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES11_S1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1E_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1L_NS1E_6fusion15Sm90TreeVisitorINS1N_11Sm90ComputeINS_10multipliesES1M_fLNS_15FloatRoundStyleE2EvEEJNS1N_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SV_SV_EEELi4ELb1EEENS1O_INS1P_IS1Q_ffLS1R_2EvEEJNS1N_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1N_12Sm90AccFetchEEEEEEENS4_13SM90_TMA_LOADENS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S25_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES29_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:31.9187583Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES12_S1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_10multipliesES1N_fLNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SW_SW_EEELi4ELb1EEENS1P_INS1Q_IS1R_ffLS1S_2EvEEJNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEENS4_13SM90_TMA_LOADENS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S26_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2A_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:31.9237313Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES11_S1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1E_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1L_NS1E_6fusion15Sm90TreeVisitorINS1N_11Sm90ComputeINS_4plusES1M_fLNS_15FloatRoundStyleE2EvEEJNS1N_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SV_SV_EEELi4ELb1EEENS1O_INS1P_INS_10multipliesEffLS1R_2EvEEJS1V_NS1O_IS1X_JNS1N_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1N_12Sm90AccFetchEEEEEEEEEENS4_13SM90_TMA_LOADENS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S27_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2B_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:31.9263417Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES12_S1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_4plusES1N_fLNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SW_SW_EEELi4ELb1EEENS1P_INS1Q_INS_10multipliesEffLS1S_2EvEEJS1W_NS1P_IS1Y_JNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEEEEENS4_13SM90_TMA_LOADENS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S28_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2C_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:31.9289235Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES11_S1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1E_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1L_NS1E_6fusion15Sm90TreeVisitorINS1N_11Sm90ComputeINS_4plusES1M_S1M_LNS_15FloatRoundStyleE2EvEEJNS1N_16Sm90ColBroadcastILi0ESI_S1M_S1M_NS5_IJSC_SV_SV_EEELi8ELb1EEENS1O_INS1P_INS_10multipliesES1M_fLS1R_2EvEEJNS1T_ILi0ESI_ffS1U_Li4ELb1EEENS1O_INS1P_IS1W_ffLS1R_2EvEEJNS1N_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1N_12Sm90AccFetchEEEEEEEEEENS4_13SM90_TMA_LOADENS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S29_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2D_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:31.9314472Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES12_S1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_4plusES1N_S1N_LNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_S1N_S1N_NS5_IJSC_SW_SW_EEELi8ELb1EEENS1P_INS1Q_INS_10multipliesES1N_fLS1S_2EvEEJNS1U_ILi0ESI_ffS1V_Li4ELb1EEENS1P_INS1Q_IS1X_ffLS1S_2EvEEJNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEEEEENS4_13SM90_TMA_LOADENS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S2A_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2E_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:59.3035219Z [110/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu.o 2025-05-07T19:59:59.3100213Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:59.3103298Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:59.3161214Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:59.3162276Z ^ 2025-05-07T19:59:59.3229345Z 2025-05-07T19:59:59.3229863Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:59.3230581Z 2025-05-07T19:59:59.3232252Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:59.3234574Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:59.3235427Z ^ 2025-05-07T19:59:59.3235749Z 2025-05-07T20:00:01.9442438Z [111/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu.o 2025-05-07T20:00:01.9455316Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:01.9457131Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:01.9458299Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:01.9458814Z ^ 2025-05-07T20:00:01.9459001Z 2025-05-07T20:00:01.9459258Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:01.9506580Z 2025-05-07T20:00:01.9507809Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:01.9509132Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:01.9509603Z ^ 2025-05-07T20:00:01.9509815Z 2025-05-07T20:00:03.5871617Z [112/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16_fast_gemv.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16_fast_gemv.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16_fast_gemv.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16_fast_gemv.cu.o 2025-05-07T20:00:03.5959375Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:04.0058017Z [113/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu.o 2025-05-07T20:00:04.0070386Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:10.5177864Z [114/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16.cu.o 2025-05-07T20:00:10.5239159Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:10.5240793Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:10.5241991Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:10.5242471Z ^ 2025-05-07T20:00:10.5303833Z 2025-05-07T20:00:10.5304175Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:10.5304594Z 2025-05-07T20:00:10.5305532Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:10.5306833Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:10.5368960Z ^ 2025-05-07T20:00:10.5369275Z 2025-05-07T20:00:18.2908208Z [115/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu.o 2025-05-07T20:00:18.2998514Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:18.3001688Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:18.3003875Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:18.3004729Z ^ 2025-05-07T20:00:18.3005072Z 2025-05-07T20:00:18.3062141Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:18.3063179Z 2025-05-07T20:00:18.3064840Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:18.3067057Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:18.3067846Z ^ 2025-05-07T20:00:18.3068138Z 2025-05-07T20:00:22.1722217Z [116/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu.o 2025-05-07T20:00:22.1745207Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:22.1748137Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:22.1750266Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:22.1751044Z ^ 2025-05-07T20:00:22.1751368Z 2025-05-07T20:00:22.1751806Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:22.1752439Z 2025-05-07T20:00:22.1753963Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:22.1756096Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:22.1766893Z ^ 2025-05-07T20:00:22.1767268Z 2025-05-07T20:00:23.3115705Z [117/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu.o 2025-05-07T20:00:23.3137668Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:24.3534720Z [118/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_example_py_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/cutlass_sgemm_nn.cu.o -MF experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/cutlass_sgemm_nn.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/cutlass_sgemm_nn.cu -o experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/cutlass_sgemm_nn.cu.o 2025-05-07T20:00:24.3556308Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:24.9664721Z [119/156] : && /github/home/miniconda/envs/build_binary/bin/c++ -fPIC -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/github/home/miniconda/envs/build_binary/lib -Wl,-rpath-link,/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs -s -shared -Wl,-soname,fbgemm_gpu_experimental_example_py.so -o experimental/example/fbgemm_gpu_experimental_example_py.so experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_nccl.cpp.o experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_ops.cpp.o experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/cutlass_sgemm_nn.cu.o -L/lib/intel64 -L/lib/intel64_win -L/lib/win-x64 -Wl,-rpath,/lib/intel64:/lib/intel64_win:/lib/win-x64:/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib:/github/home/miniconda/envs/build_binary/lib/stubs: /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2 /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch_cpu.so" -Wl,--as-needed -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch_cuda.so" -Wl,--as-needed /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libcudart.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch.so" -Wl,--as-needed -lcudadevrt -lcudart_static -ldl && : 2025-05-07T20:00:24.9755263Z [120/156] cd /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build/experimental/example && bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/../.github/scripts/fbgemm_gpu_postbuild.bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:00:24.9757893Z ################################################################################ 2025-05-07T20:00:24.9758536Z [CMAKE] Running post-build script ... 2025-05-07T20:00:24.9759812Z Target file: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:00:24.9761442Z Removing all RPATHs ... 2025-05-07T20:00:24.9761905Z ################################################################################ 2025-05-07T20:00:54.0850592Z [121/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu.o 2025-05-07T20:00:54.0870079Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:54.0872506Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:54.0874338Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:54.0875224Z ^ 2025-05-07T20:00:54.1046240Z 2025-05-07T20:00:54.1047157Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:54.1047768Z 2025-05-07T20:00:54.1049089Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:54.1050898Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:54.1055470Z ^ 2025-05-07T20:00:54.1055821Z 2025-05-07T20:00:55.5640859Z [122/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu.o 2025-05-07T20:00:55.5751617Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:55.5782038Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:55.5812245Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:55.5813037Z ^ 2025-05-07T20:00:55.5813403Z 2025-05-07T20:00:55.5813818Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:55.5814444Z 2025-05-07T20:00:55.5849545Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:55.5851951Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:55.5852683Z ^ 2025-05-07T20:00:55.5852978Z 2025-05-07T20:00:55.5854577Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:55.5857479Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:55.5858462Z ^ 2025-05-07T20:00:55.5858865Z 2025-05-07T20:00:55.5860412Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:55.5862586Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:55.5863380Z ^ 2025-05-07T20:00:55.5863791Z 2025-05-07T20:00:55.5865337Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:55.5867581Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:55.5868382Z ^ 2025-05-07T20:00:55.5868774Z 2025-05-07T20:00:55.5869203Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:55.5869828Z 2025-05-07T20:00:55.5871384Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:55.5873588Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:55.5874398Z ^ 2025-05-07T20:00:55.5875044Z 2025-05-07T20:00:55.5876659Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:55.5878901Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:55.5879701Z ^ 2025-05-07T20:00:55.5880069Z 2025-05-07T20:00:55.5880462Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:55.5881079Z 2025-05-07T20:00:55.5882652Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:55.5884891Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:55.5885683Z ^ 2025-05-07T20:00:55.5886076Z 2025-05-07T20:00:55.5887697Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:55.5889860Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:55.5890657Z ^ 2025-05-07T20:00:55.5891017Z 2025-05-07T20:00:55.5891419Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:55.5892010Z 2025-05-07T20:00:55.5893579Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:55.5896276Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:55.5897263Z ^ 2025-05-07T20:00:55.5897661Z 2025-05-07T20:00:55.5899237Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:55.5901426Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:55.5902232Z ^ 2025-05-07T20:00:55.5902613Z 2025-05-07T20:00:55.5903001Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:55.5903598Z 2025-05-07T20:00:55.5905204Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:55.5907439Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:55.5908267Z ^ 2025-05-07T20:00:55.5908653Z 2025-05-07T20:00:55.5910272Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:55.5912444Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:55.5913251Z ^ 2025-05-07T20:00:55.5913602Z 2025-05-07T20:00:55.5914031Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:55.5914617Z 2025-05-07T20:00:55.5916248Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:55.5918419Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:55.5919223Z ^ 2025-05-07T20:00:55.5919634Z 2025-05-07T20:00:55.5921243Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:55.5923439Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:55.5924226Z ^ 2025-05-07T20:00:55.5924610Z 2025-05-07T20:00:55.5925020Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:55.5925628Z 2025-05-07T20:00:55.5927274Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:55.5929513Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:55.5930376Z ^ 2025-05-07T20:00:55.5930787Z 2025-05-07T20:00:59.1426175Z [123/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/mixed_dtype_utils.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/mixed_dtype_utils.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/mixed_dtype_utils.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/mixed_dtype_utils.cu.o 2025-05-07T20:00:59.1446597Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:59.1449207Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:59.1451216Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:59.1451930Z ^ 2025-05-07T20:00:59.1452286Z 2025-05-07T20:00:59.1452677Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:59.1453213Z 2025-05-07T20:00:59.1454639Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:59.1456749Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:59.1457450Z ^ 2025-05-07T20:00:59.1457786Z 2025-05-07T20:00:59.1459171Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:59.1461430Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:59.1462644Z ^ 2025-05-07T20:00:59.1462972Z 2025-05-07T20:00:59.1463364Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:59.1463900Z 2025-05-07T20:00:59.1465302Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:59.1467254Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:59.1467970Z ^ 2025-05-07T20:00:59.1468358Z 2025-05-07T20:00:59.1469779Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:59.1471728Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:59.1472428Z ^ 2025-05-07T20:00:59.1472781Z 2025-05-07T20:00:59.1473140Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:59.1473675Z 2025-05-07T20:00:59.1475384Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:59.1477346Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:59.1478086Z ^ 2025-05-07T20:00:59.1478434Z 2025-05-07T20:00:59.1479885Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:59.1481725Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:59.1482420Z ^ 2025-05-07T20:00:59.1482734Z 2025-05-07T20:00:59.1483073Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:59.1483590Z 2025-05-07T20:00:59.1484903Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:59.1486865Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:59.1487627Z ^ 2025-05-07T20:00:59.1488007Z 2025-05-07T20:00:59.1489410Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:59.1491377Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:59.1492088Z ^ 2025-05-07T20:00:59.1492500Z 2025-05-07T20:00:59.1492852Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:59.1493681Z 2025-05-07T20:00:59.1495329Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:59.1497393Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:59.1498151Z ^ 2025-05-07T20:00:59.1498505Z 2025-05-07T20:00:59.1499913Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:59.1501939Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:59.1502696Z ^ 2025-05-07T20:00:59.1503016Z 2025-05-07T20:00:59.1503373Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:59.1503935Z 2025-05-07T20:00:59.1505428Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:59.1507457Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:59.1508193Z ^ 2025-05-07T20:00:59.1508579Z 2025-05-07T20:00:59.1510050Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:59.1512156Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:59.1512967Z ^ 2025-05-07T20:00:59.1513342Z 2025-05-07T20:00:59.1513722Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:59.1514282Z 2025-05-07T20:00:59.1515930Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:00:59.1518223Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:00:59.1519082Z ^ 2025-05-07T20:00:59.1519477Z 2025-05-07T20:01:19.0096288Z [124/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu.o 2025-05-07T20:01:19.0114420Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:19.0116885Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:19.0118656Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:19.0119313Z ^ 2025-05-07T20:01:19.0119559Z 2025-05-07T20:01:19.0119968Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:19.0120544Z 2025-05-07T20:01:19.0121711Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:19.0123297Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:19.0123916Z ^ 2025-05-07T20:01:19.0124166Z 2025-05-07T20:01:19.6631111Z [125/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu.o 2025-05-07T20:01:19.6653111Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:19.6655846Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:19.6657853Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:19.6658642Z ^ 2025-05-07T20:01:19.6658911Z 2025-05-07T20:01:19.6659303Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:19.6659917Z 2025-05-07T20:01:19.6661384Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:19.6663378Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:19.6664159Z ^ 2025-05-07T20:01:19.6664453Z 2025-05-07T20:01:19.6665869Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:19.6667804Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:19.6668569Z ^ 2025-05-07T20:01:19.6669023Z detected during: 2025-05-07T20:01:19.6695241Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:19.6744161Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:19.6794709Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:19.6822577Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:01:19.6824632Z 2025-05-07T20:01:19.6825078Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:19.6825650Z 2025-05-07T20:01:19.6827089Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:19.6829003Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:19.6829741Z ^ 2025-05-07T20:01:19.6830138Z detected during: 2025-05-07T20:01:19.6854520Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:19.6903787Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:19.6953384Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:19.7001787Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:19.7029615Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:01:19.7031615Z 2025-05-07T20:01:19.7032909Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:19.7034779Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:19.7035502Z ^ 2025-05-07T20:01:19.7035928Z detected during: 2025-05-07T20:01:19.7061289Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:19.7108588Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:19.7157576Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:19.7186000Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:01:19.7187851Z 2025-05-07T20:01:19.7188250Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:19.7188813Z 2025-05-07T20:01:19.7190155Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:19.7192058Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:19.7192739Z ^ 2025-05-07T20:01:19.7193139Z detected during: 2025-05-07T20:01:19.7216609Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:19.7263537Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:19.7308779Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:19.7356152Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:19.7383487Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:01:19.7385610Z 2025-05-07T20:01:19.7387151Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:19.7389023Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:19.7389753Z ^ 2025-05-07T20:01:19.7390202Z detected during: 2025-05-07T20:01:19.7415155Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:19.7462807Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:19.7510380Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:19.7537614Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:01:19.7539509Z 2025-05-07T20:01:19.7539935Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:19.7540487Z 2025-05-07T20:01:19.7541819Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:19.7543716Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:19.7544378Z ^ 2025-05-07T20:01:19.7544734Z detected during: 2025-05-07T20:01:19.7568776Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:19.7617360Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:19.7664065Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:19.7713820Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:19.7740796Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:01:19.7742720Z 2025-05-07T20:01:19.7744861Z ptxas /tmp/tmpxft_00008c87_00000000-9_f4f4bf16_128_128_4_1_1_f.compute_90.ptx, line 925; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:01:19.7749500Z ptxas /tmp/tmpxft_00008c87_00000000-9_f4f4bf16_128_128_4_1_1_f.compute_90.ptx, line 937; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:01:19.7753975Z ptxas /tmp/tmpxft_00008c87_00000000-9_f4f4bf16_128_128_4_1_1_f.compute_90.ptx, line 1076; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:01:19.7758447Z ptxas /tmp/tmpxft_00008c87_00000000-9_f4f4bf16_128_128_4_1_1_f.compute_90.ptx, line 1088; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:01:19.7762224Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:19.7764224Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:19.7764943Z ^ 2025-05-07T20:01:19.7765375Z detected during: 2025-05-07T20:01:19.7791362Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:19.7839836Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:19.7889383Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:19.7916302Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:01:19.7918234Z 2025-05-07T20:01:19.7918617Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:19.7919173Z 2025-05-07T20:01:19.7920462Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:19.7922270Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:19.7922927Z ^ 2025-05-07T20:01:19.7923301Z detected during: 2025-05-07T20:01:19.7947517Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:19.7996187Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:19.8042192Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:19.8094255Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:19.8123726Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:01:19.8125858Z 2025-05-07T20:01:19.8127350Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:19.8129445Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:19.8130247Z ^ 2025-05-07T20:01:19.8130700Z detected during: 2025-05-07T20:01:19.8157912Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:19.8194321Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:19.8222850Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:19.8238961Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:01:19.8240134Z 2025-05-07T20:01:19.8240385Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:19.8240775Z 2025-05-07T20:01:19.8241588Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:19.8242726Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:19.8243147Z ^ 2025-05-07T20:01:19.8243412Z detected during: 2025-05-07T20:01:19.8257243Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:19.8285748Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:19.8313650Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:19.8341976Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:19.8358182Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:01:19.8359418Z 2025-05-07T20:01:19.8360229Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:19.8361447Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:19.8361904Z ^ 2025-05-07T20:01:19.8362188Z detected during: 2025-05-07T20:01:19.8377182Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:19.8405051Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:19.8433448Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:19.8449448Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:01:19.8450608Z 2025-05-07T20:01:19.8450861Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:19.8451216Z 2025-05-07T20:01:19.8452038Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:19.8453152Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:19.8453588Z ^ 2025-05-07T20:01:19.8453824Z detected during: 2025-05-07T20:01:19.8467643Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:19.8496187Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:19.8524041Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:19.8552488Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:19.8568699Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:01:19.8569836Z 2025-05-07T20:01:27.6289983Z [126/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu.o 2025-05-07T20:01:27.6313474Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:27.6316407Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:27.6318497Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:27.6319794Z ^ 2025-05-07T20:01:27.6320096Z 2025-05-07T20:01:27.6320518Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:27.6321207Z 2025-05-07T20:01:27.6322900Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:27.6325055Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:27.6325912Z ^ 2025-05-07T20:01:27.6326243Z 2025-05-07T20:01:27.6327670Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:27.6329778Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:27.6330588Z ^ 2025-05-07T20:01:27.6331123Z detected during: 2025-05-07T20:01:27.6358650Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:27.6563800Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:27.6617678Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:27.6647847Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:01:27.6650032Z 2025-05-07T20:01:27.6650474Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:27.6651145Z 2025-05-07T20:01:27.6652601Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:27.6654655Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:27.6655456Z ^ 2025-05-07T20:01:27.6655888Z detected during: 2025-05-07T20:01:27.6724184Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:27.6777804Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:27.6830224Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:27.6883422Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:27.6913296Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:01:27.6915440Z 2025-05-07T20:01:27.6916917Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:27.6919041Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:27.6919844Z ^ 2025-05-07T20:01:27.6920338Z detected during: 2025-05-07T20:01:27.6947885Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:27.7000402Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:27.7052580Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:27.7082325Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:01:27.7084474Z 2025-05-07T20:01:27.7084916Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:27.7085568Z 2025-05-07T20:01:27.7087045Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:27.7089394Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:27.7090123Z ^ 2025-05-07T20:01:27.7090670Z detected during: 2025-05-07T20:01:27.7116243Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:27.7169184Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:27.7223447Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:27.7276251Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:27.7305983Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:01:27.7308076Z 2025-05-07T20:01:27.7309564Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:27.7311590Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:27.7312387Z ^ 2025-05-07T20:01:27.7312839Z detected during: 2025-05-07T20:01:27.7339847Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:27.7392051Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:27.7444422Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:27.7474145Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:01:27.7476444Z 2025-05-07T20:01:27.7476917Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:27.7477559Z 2025-05-07T20:01:27.7479024Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:27.7481062Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:27.7481767Z ^ 2025-05-07T20:01:27.7482174Z detected during: 2025-05-07T20:01:27.7507714Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:27.7560204Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:27.7612545Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:27.7665005Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:27.7693657Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:01:27.7695829Z 2025-05-07T20:01:27.7698090Z ptxas /tmp/tmpxft_00008c8a_00000000-9_f4f4bf16_128_128_4_1_1_t.compute_90.ptx, line 925; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:01:27.7702707Z ptxas /tmp/tmpxft_00008c8a_00000000-9_f4f4bf16_128_128_4_1_1_t.compute_90.ptx, line 937; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:01:27.7707304Z ptxas /tmp/tmpxft_00008c8a_00000000-9_f4f4bf16_128_128_4_1_1_t.compute_90.ptx, line 1076; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:01:27.7712513Z ptxas /tmp/tmpxft_00008c8a_00000000-9_f4f4bf16_128_128_4_1_1_t.compute_90.ptx, line 1088; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:01:27.7716359Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:27.7718428Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:27.7719209Z ^ 2025-05-07T20:01:27.7719654Z detected during: 2025-05-07T20:01:27.7747031Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:27.7789503Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:27.7818107Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:27.7834704Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:01:27.7835856Z 2025-05-07T20:01:27.7836105Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:27.7836492Z 2025-05-07T20:01:27.7837301Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:27.7838454Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:27.7838876Z ^ 2025-05-07T20:01:27.7839136Z detected during: 2025-05-07T20:01:27.7853002Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:27.7881856Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:27.7910053Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:27.7938585Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:27.7954663Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:01:27.7955829Z 2025-05-07T20:01:27.7956642Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:27.7957806Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:27.7958255Z ^ 2025-05-07T20:01:27.7958540Z detected during: 2025-05-07T20:01:27.7973280Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:27.8001494Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:27.8029937Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:27.8045928Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:01:27.8047073Z 2025-05-07T20:01:27.8047319Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:27.8047683Z 2025-05-07T20:01:27.8048511Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:27.8049622Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:27.8050100Z ^ 2025-05-07T20:01:27.8050336Z detected during: 2025-05-07T20:01:27.8064115Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:27.8094028Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:27.8122189Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:27.8150530Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:27.8166555Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:01:27.8167689Z 2025-05-07T20:01:27.8168526Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:27.8169682Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:27.8170154Z ^ 2025-05-07T20:01:27.8170420Z detected during: 2025-05-07T20:01:27.8185318Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:27.8213316Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:27.8266842Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:27.8283853Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:01:27.8285049Z 2025-05-07T20:01:27.8285310Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:27.8285684Z 2025-05-07T20:01:27.8286523Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:27.8287707Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:27.8288161Z ^ 2025-05-07T20:01:27.8288477Z detected during: 2025-05-07T20:01:27.8302313Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:27.8331115Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:27.8359272Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:27.8388171Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:27.8559233Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:01:27.8560479Z 2025-05-07T20:01:37.7513896Z [127/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu.o 2025-05-07T20:01:37.7528656Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:37.7530264Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:37.7531425Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:37.7531922Z ^ 2025-05-07T20:01:37.7532103Z 2025-05-07T20:01:37.7532358Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:37.7532752Z 2025-05-07T20:01:37.7533589Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:37.7534873Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:37.7535500Z ^ 2025-05-07T20:01:37.7535706Z 2025-05-07T20:01:37.7536509Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:37.7537684Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:37.7538145Z ^ 2025-05-07T20:01:37.7538454Z detected during: 2025-05-07T20:01:37.7553336Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:37.7582162Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:37.7610986Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:37.7627250Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:37.7628426Z 2025-05-07T20:01:37.7628683Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:37.7629074Z 2025-05-07T20:01:37.7629887Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:37.7631033Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:37.7631459Z ^ 2025-05-07T20:01:37.7631737Z detected during: 2025-05-07T20:01:37.7645741Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:37.7674350Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:37.7703402Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:37.7732134Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:37.7748395Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:37.7749572Z 2025-05-07T20:01:37.7750379Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:37.7751566Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:37.7752099Z ^ 2025-05-07T20:01:37.7752402Z detected during: 2025-05-07T20:01:37.7767397Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:37.7795809Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:37.7836817Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:37.7853226Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:37.7854395Z 2025-05-07T20:01:37.7854649Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:37.7855023Z 2025-05-07T20:01:37.7855970Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:37.7857099Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:37.7857538Z ^ 2025-05-07T20:01:37.7857813Z detected during: 2025-05-07T20:01:37.7871714Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:37.7900622Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:37.7928935Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:37.7957724Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:37.7974083Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:37.7975450Z 2025-05-07T20:01:37.7976257Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:37.7977425Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:37.7977891Z ^ 2025-05-07T20:01:37.7978194Z detected during: 2025-05-07T20:01:37.7993129Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:37.8021415Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:37.8050232Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:37.8066396Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:37.8067582Z 2025-05-07T20:01:37.8067825Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:37.8068189Z 2025-05-07T20:01:37.8068989Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:37.8070135Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:37.8070574Z ^ 2025-05-07T20:01:37.8070809Z detected during: 2025-05-07T20:01:37.8084881Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:37.8113420Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:37.8141559Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:37.8170261Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:37.8187592Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:37.8188729Z 2025-05-07T20:01:37.8189563Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:37.8190702Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:37.8191185Z ^ 2025-05-07T20:01:37.8191457Z detected during: 2025-05-07T20:01:37.8206222Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:37.8234336Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:37.8262870Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:37.8279171Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:37.8280386Z 2025-05-07T20:01:37.8280659Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:37.8281143Z 2025-05-07T20:01:37.8281949Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:37.8283080Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:37.8283498Z ^ 2025-05-07T20:01:37.8283762Z detected during: 2025-05-07T20:01:37.8297668Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:37.8326228Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:37.8354353Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:37.8383024Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:37.8399202Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:37.8400329Z 2025-05-07T20:01:37.8401129Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:37.8402311Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:37.8402785Z ^ 2025-05-07T20:01:37.8403085Z detected during: 2025-05-07T20:01:37.8417933Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:37.8446077Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:37.8475382Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:37.8491548Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:37.8492681Z 2025-05-07T20:01:37.8492953Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:37.8493322Z 2025-05-07T20:01:37.8494114Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:37.8495245Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:37.8495755Z ^ 2025-05-07T20:01:37.8495995Z detected during: 2025-05-07T20:01:37.8509894Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:37.8538457Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:37.8566838Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:37.8595782Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:37.8611981Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:37.8613116Z 2025-05-07T20:01:37.8613920Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:37.8615145Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:37.8615664Z ^ 2025-05-07T20:01:37.8615937Z detected during: 2025-05-07T20:01:37.8630708Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:37.8658836Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:37.8687905Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:37.8704303Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:37.8705439Z 2025-05-07T20:01:37.8705696Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:37.8706085Z 2025-05-07T20:01:37.8706888Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:37.8708027Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:37.8708442Z ^ 2025-05-07T20:01:37.8708697Z detected during: 2025-05-07T20:01:37.8722606Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:37.8751149Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:37.8779494Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:37.8808054Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:37.8825400Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:37.8826551Z 2025-05-07T20:01:38.2465030Z [128/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu.o 2025-05-07T20:01:38.2477805Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:38.2479398Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:38.2480546Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:38.2481386Z ^ 2025-05-07T20:01:38.2481562Z 2025-05-07T20:01:38.2481809Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:38.2482189Z 2025-05-07T20:01:38.2483134Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:38.2484326Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:38.2484766Z ^ 2025-05-07T20:01:38.2484959Z 2025-05-07T20:01:38.2485749Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:38.2486910Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:38.2487361Z ^ 2025-05-07T20:01:38.2487652Z detected during: 2025-05-07T20:01:38.2502326Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:38.2530211Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:38.2558456Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:38.2574430Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:01:38.2575888Z 2025-05-07T20:01:38.2576143Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:38.2576507Z 2025-05-07T20:01:38.2577334Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:38.2578455Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:38.2578894Z ^ 2025-05-07T20:01:38.2579139Z detected during: 2025-05-07T20:01:38.2593506Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:38.2621770Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:38.2659948Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:38.2690077Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:38.2706230Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:01:38.2707404Z 2025-05-07T20:01:38.2708221Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:38.2709396Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:38.2709847Z ^ 2025-05-07T20:01:38.2710145Z detected during: 2025-05-07T20:01:38.2724801Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:38.2752549Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:38.2781083Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:38.2798167Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:01:38.2799329Z 2025-05-07T20:01:38.2799579Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:38.2799946Z 2025-05-07T20:01:38.2800776Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:38.2801900Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:38.2802343Z ^ 2025-05-07T20:01:38.2802623Z detected during: 2025-05-07T20:01:38.2816784Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:38.2845331Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:38.2873265Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:38.2901705Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:38.2917696Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:01:38.2918865Z 2025-05-07T20:01:38.2919669Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:38.2920850Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:38.2921319Z ^ 2025-05-07T20:01:38.2921640Z detected during: 2025-05-07T20:01:38.2936275Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:38.2964698Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:38.2993236Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:38.3009246Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:01:38.3010410Z 2025-05-07T20:01:38.3010669Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:38.3011106Z 2025-05-07T20:01:38.3011918Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:38.3013130Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:38.3013588Z ^ 2025-05-07T20:01:38.3013841Z detected during: 2025-05-07T20:01:38.3027784Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:38.3056069Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:38.3084051Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:38.3112958Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:38.3129006Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:01:38.3130144Z 2025-05-07T20:01:38.3130986Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:38.3132152Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:38.3132623Z ^ 2025-05-07T20:01:38.3132914Z detected during: 2025-05-07T20:01:38.3147535Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:38.3175556Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:38.3203669Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:38.3219837Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:01:38.3220984Z 2025-05-07T20:01:38.3221274Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:38.3221643Z 2025-05-07T20:01:38.3222450Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:38.3223607Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:38.3224039Z ^ 2025-05-07T20:01:38.3224337Z detected during: 2025-05-07T20:01:38.3238248Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:38.3266622Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:38.3294409Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:38.3322694Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:38.3338712Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:01:38.3339853Z 2025-05-07T20:01:38.3340667Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:38.3341866Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:38.3342353Z ^ 2025-05-07T20:01:38.3342633Z detected during: 2025-05-07T20:01:38.3357301Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:38.3385170Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:38.3413378Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:38.3430022Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:01:38.3431200Z 2025-05-07T20:01:38.3431460Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:38.3431866Z 2025-05-07T20:01:38.3432680Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:38.3433802Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:38.3434256Z ^ 2025-05-07T20:01:38.3434535Z detected during: 2025-05-07T20:01:38.3448492Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:38.3477013Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:38.3504739Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:38.3532906Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:38.3548897Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:01:38.3550035Z 2025-05-07T20:01:38.3550853Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:38.3551996Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:38.3552464Z ^ 2025-05-07T20:01:38.3552731Z detected during: 2025-05-07T20:01:38.3567375Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:38.3595481Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:38.3623685Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:38.3639586Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:01:38.3640720Z 2025-05-07T20:01:38.3640995Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:38.3641365Z 2025-05-07T20:01:38.3642167Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:38.3643294Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:38.3643703Z ^ 2025-05-07T20:01:38.3643971Z detected during: 2025-05-07T20:01:38.3657889Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:38.3686272Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:38.3714346Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:38.3742541Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:38.3758499Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:01:38.3759619Z 2025-05-07T20:01:39.2470680Z [129/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu.o 2025-05-07T20:01:39.2483763Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:39.2485341Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.2486502Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.2486970Z ^ 2025-05-07T20:01:39.2487145Z 2025-05-07T20:01:39.2487414Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:39.2487777Z 2025-05-07T20:01:39.2488599Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.2489778Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:39.2490223Z ^ 2025-05-07T20:01:39.2490415Z 2025-05-07T20:01:39.2491209Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.2492489Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.2492928Z ^ 2025-05-07T20:01:39.2493212Z detected during: 2025-05-07T20:01:39.2507928Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.2535543Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.2563558Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.2579823Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:39.2580978Z 2025-05-07T20:01:39.2581220Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:39.2581612Z 2025-05-07T20:01:39.2582416Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.2583553Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.2583960Z ^ 2025-05-07T20:01:39.2584219Z detected during: 2025-05-07T20:01:39.2598134Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:39.2626274Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.2653816Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.2682220Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.2698628Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:39.2699871Z 2025-05-07T20:01:39.2700736Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.2701914Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.2702362Z ^ 2025-05-07T20:01:39.2702649Z detected during: 2025-05-07T20:01:39.2717109Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.2745431Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.2773685Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.2789838Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:39.2790993Z 2025-05-07T20:01:39.2791244Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:39.2791606Z 2025-05-07T20:01:39.2792426Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.2793542Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.2793967Z ^ 2025-05-07T20:01:39.2794207Z detected during: 2025-05-07T20:01:39.2808126Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:39.2836588Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.2864400Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.2892691Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.2908763Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:39.2909885Z 2025-05-07T20:01:39.2910713Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.2911860Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.2912340Z ^ 2025-05-07T20:01:39.2912613Z detected during: 2025-05-07T20:01:39.2927226Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.2955206Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.2983887Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.2999796Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:39.3000943Z 2025-05-07T20:01:39.3001211Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:39.3001576Z 2025-05-07T20:01:39.3002377Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.3003516Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.3003920Z ^ 2025-05-07T20:01:39.3004178Z detected during: 2025-05-07T20:01:39.3018056Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:39.3046174Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.3073867Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.3102111Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.3118039Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:39.3119183Z 2025-05-07T20:01:39.3119988Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.3121148Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.3121621Z ^ 2025-05-07T20:01:39.3121887Z detected during: 2025-05-07T20:01:39.3136452Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.3164619Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.3192913Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.3209151Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:39.3210316Z 2025-05-07T20:01:39.3210565Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:39.3210945Z 2025-05-07T20:01:39.3211742Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.3212848Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.3213293Z ^ 2025-05-07T20:01:39.3213549Z detected during: 2025-05-07T20:01:39.3227419Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:39.3255634Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.3284170Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.3312230Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.3328139Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:39.3329296Z 2025-05-07T20:01:39.3330096Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.3331262Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.3331721Z ^ 2025-05-07T20:01:39.3332024Z detected during: 2025-05-07T20:01:39.3346590Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.3374138Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.3402309Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.3418277Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:39.3419432Z 2025-05-07T20:01:39.3419685Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:39.3420041Z 2025-05-07T20:01:39.3420937Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.3422047Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.3422527Z ^ 2025-05-07T20:01:39.3422770Z detected during: 2025-05-07T20:01:39.3436696Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:39.3464864Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.3492481Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.3520645Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.3536747Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:39.3537886Z 2025-05-07T20:01:39.3538716Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.3539854Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.3540335Z ^ 2025-05-07T20:01:39.3540608Z detected during: 2025-05-07T20:01:39.3556237Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.3583949Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.3612046Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.3627946Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:39.3629197Z 2025-05-07T20:01:39.3629471Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:39.3629827Z 2025-05-07T20:01:39.3630624Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:39.3631755Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:39.3632167Z ^ 2025-05-07T20:01:39.3632421Z detected during: 2025-05-07T20:01:39.3646214Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:39.3674423Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:39.3702612Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:39.3730970Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:39.3746933Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:39.3748064Z 2025-05-07T20:01:40.9464223Z [130/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu.o 2025-05-07T20:01:40.9477165Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:40.9478749Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.9479914Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.9480393Z ^ 2025-05-07T20:01:40.9480567Z 2025-05-07T20:01:40.9480814Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:40.9481204Z 2025-05-07T20:01:40.9482023Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.9483204Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:40.9483654Z ^ 2025-05-07T20:01:40.9483855Z 2025-05-07T20:01:40.9484650Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.9485816Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.9486255Z ^ 2025-05-07T20:01:40.9486547Z detected during: 2025-05-07T20:01:40.9501555Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.9529791Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.9558942Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.9575381Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:40.9576538Z 2025-05-07T20:01:40.9576787Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:40.9577150Z 2025-05-07T20:01:40.9577978Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.9579089Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.9579520Z ^ 2025-05-07T20:01:40.9579757Z detected during: 2025-05-07T20:01:40.9593650Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:40.9622213Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.9650356Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.9679054Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.9695384Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:40.9696518Z 2025-05-07T20:01:40.9697352Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.9698503Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.9698985Z ^ 2025-05-07T20:01:40.9699260Z detected during: 2025-05-07T20:01:40.9714216Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.9742317Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.9770919Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.9787251Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:40.9788385Z 2025-05-07T20:01:40.9788655Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:40.9789020Z 2025-05-07T20:01:40.9789823Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.9790957Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.9791365Z ^ 2025-05-07T20:01:40.9791615Z detected during: 2025-05-07T20:01:40.9805525Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:40.9833974Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.9862209Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.9891019Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.9907367Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:40.9908495Z 2025-05-07T20:01:40.9909294Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.9910460Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.9910928Z ^ 2025-05-07T20:01:40.9911199Z detected during: 2025-05-07T20:01:40.9926071Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.9954332Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.9983058Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.9999249Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:41.0000396Z 2025-05-07T20:01:41.0000645Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:41.0001032Z 2025-05-07T20:01:41.0001833Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:41.0002964Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:41.0003376Z ^ 2025-05-07T20:01:41.0003636Z detected during: 2025-05-07T20:01:41.0017554Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:41.0045898Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:41.0074250Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:41.0103041Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:41.0119373Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:41.0120533Z 2025-05-07T20:01:41.0121330Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:41.0122500Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:41.0122948Z ^ 2025-05-07T20:01:41.0123236Z detected during: 2025-05-07T20:01:41.0138003Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:41.0166265Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:41.0195403Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:41.0211529Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:41.0212655Z 2025-05-07T20:01:41.0212895Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:41.0213310Z 2025-05-07T20:01:41.0214114Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:41.0215268Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:41.0215731Z ^ 2025-05-07T20:01:41.0215953Z detected during: 2025-05-07T20:01:41.0229790Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:41.0276331Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:41.0304755Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:41.0333223Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:41.0349540Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:41.0350678Z 2025-05-07T20:01:41.0351485Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:41.0352662Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:41.0353139Z ^ 2025-05-07T20:01:41.0353415Z detected during: 2025-05-07T20:01:41.0368321Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:41.0396692Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:41.0425324Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:41.0441588Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:41.0442724Z 2025-05-07T20:01:41.0442977Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:41.0443375Z 2025-05-07T20:01:41.0444175Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:41.0445328Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:41.0445742Z ^ 2025-05-07T20:01:41.0446004Z detected during: 2025-05-07T20:01:41.0459927Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:41.0488493Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:41.0516698Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:41.0545253Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:41.0561375Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:41.0562533Z 2025-05-07T20:01:41.0563335Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:41.0564497Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:41.0564950Z ^ 2025-05-07T20:01:41.0565241Z detected during: 2025-05-07T20:01:41.0580320Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:41.0608791Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:41.0637405Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:41.0653568Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:41.0654730Z 2025-05-07T20:01:41.0654982Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:41.0655412Z 2025-05-07T20:01:41.0656246Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:41.0657371Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:41.0657805Z ^ 2025-05-07T20:01:41.0658050Z detected during: 2025-05-07T20:01:41.0671932Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:41.0700561Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:41.0728725Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:41.0757276Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:41.0773515Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:41.0774641Z 2025-05-07T20:01:42.9934278Z [131/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu.o 2025-05-07T20:01:43.0022645Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:43.0024422Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:43.0025633Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:43.0026093Z ^ 2025-05-07T20:01:43.0026274Z 2025-05-07T20:01:43.0026543Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:43.0026907Z 2025-05-07T20:01:43.0027739Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:43.0030479Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:43.0030927Z ^ 2025-05-07T20:01:43.0031102Z 2025-05-07T20:01:43.0032048Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:43.0033190Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:43.0033672Z ^ 2025-05-07T20:01:43.0033974Z detected during: 2025-05-07T20:01:43.0048485Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:43.0076481Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:43.0104601Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:43.0120525Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:43.0121689Z 2025-05-07T20:01:43.0121940Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:43.0122302Z 2025-05-07T20:01:43.0123104Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:43.0124255Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:43.0124698Z ^ 2025-05-07T20:01:43.0124945Z detected during: 2025-05-07T20:01:43.0138816Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:43.0166962Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:43.0194653Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:43.0222792Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:43.0238686Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:43.0239836Z 2025-05-07T20:01:43.0240635Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:43.0241803Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:43.0242276Z ^ 2025-05-07T20:01:43.0242542Z detected during: 2025-05-07T20:01:43.0257116Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:43.0284869Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:43.0312866Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:43.0328775Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:43.0329909Z 2025-05-07T20:01:43.0330155Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:43.0330542Z 2025-05-07T20:01:43.0331339Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:43.0332473Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:43.0332879Z ^ 2025-05-07T20:01:43.0333140Z detected during: 2025-05-07T20:01:43.0346975Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:43.0375353Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:43.0402993Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:43.0431121Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:43.0447063Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:43.0448246Z 2025-05-07T20:01:43.0449084Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:43.0450244Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:43.0450690Z ^ 2025-05-07T20:01:43.0450963Z detected during: 2025-05-07T20:01:43.0465508Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:43.0493407Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:43.0521509Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:43.0537428Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:43.0538577Z 2025-05-07T20:01:43.0538827Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:43.0539183Z 2025-05-07T20:01:43.0540007Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:43.0541111Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:43.0541540Z ^ 2025-05-07T20:01:43.0541770Z detected during: 2025-05-07T20:01:43.0555682Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:43.0583992Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:43.0611447Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:43.0639481Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:43.0655432Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:43.0656568Z 2025-05-07T20:01:43.0657399Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:43.0658532Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:43.0659009Z ^ 2025-05-07T20:01:43.0659283Z detected during: 2025-05-07T20:01:43.0673714Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:43.0701510Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:43.0729560Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:43.0745545Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:43.0746686Z 2025-05-07T20:01:43.0746955Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:43.0747379Z 2025-05-07T20:01:43.0748169Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:43.0749354Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:43.0749764Z ^ 2025-05-07T20:01:43.0750025Z detected during: 2025-05-07T20:01:43.0763902Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:43.0792360Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:43.0819995Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:43.0848126Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:43.0864090Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:43.0865238Z 2025-05-07T20:01:43.0866040Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:43.0867194Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:43.0867665Z ^ 2025-05-07T20:01:43.0867935Z detected during: 2025-05-07T20:01:43.0882596Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:43.0910213Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:43.0938284Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:43.0954282Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:43.0955416Z 2025-05-07T20:01:43.0955659Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:43.0956039Z 2025-05-07T20:01:43.0956856Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:43.0958027Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:43.0958432Z ^ 2025-05-07T20:01:43.0958686Z detected during: 2025-05-07T20:01:43.0972602Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:43.1001000Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:43.1028556Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:43.1056608Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:43.1072521Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:43.1073682Z 2025-05-07T20:01:43.1074482Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:43.1075774Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:43.1076225Z ^ 2025-05-07T20:01:43.1076520Z detected during: 2025-05-07T20:01:43.1091052Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:43.1119567Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:43.1147721Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:43.1163720Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:43.1164886Z 2025-05-07T20:01:43.1165129Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:43.1165491Z 2025-05-07T20:01:43.1166317Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:43.1167422Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:43.1167852Z ^ 2025-05-07T20:01:43.1168095Z detected during: 2025-05-07T20:01:43.1182092Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:43.1210251Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:43.1237779Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:43.1265746Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:43.1281670Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:43.1282801Z 2025-05-07T20:01:48.0001399Z [132/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu.o 2025-05-07T20:01:48.0014359Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:48.0016027Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:48.0017169Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:48.0017600Z ^ 2025-05-07T20:01:48.0017769Z 2025-05-07T20:01:48.0018017Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:48.0018366Z 2025-05-07T20:01:48.0019180Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:48.0020358Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:48.0020822Z ^ 2025-05-07T20:01:48.0020995Z 2025-05-07T20:01:48.6688503Z [133/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu.o 2025-05-07T20:01:48.6701372Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:48.6702951Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:48.6704113Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:48.6704581Z ^ 2025-05-07T20:01:48.6704782Z 2025-05-07T20:01:48.6705037Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:48.6705392Z 2025-05-07T20:01:48.6706237Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:48.6707400Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:48.6707864Z ^ 2025-05-07T20:01:48.6708039Z 2025-05-07T20:01:48.6708849Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:48.6709988Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:48.6710465Z ^ 2025-05-07T20:01:48.6710738Z detected during: 2025-05-07T20:01:48.6725484Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:48.6753241Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:48.6781978Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:48.6798025Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:48.6799163Z 2025-05-07T20:01:48.6799416Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:48.6799807Z 2025-05-07T20:01:48.6800613Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:48.6801745Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:48.6802155Z ^ 2025-05-07T20:01:48.6802417Z detected during: 2025-05-07T20:01:48.6816351Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:48.6844660Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:48.6872488Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:48.6901433Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:48.6917374Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:48.6918539Z 2025-05-07T20:01:48.6919344Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:48.6920523Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:48.6920972Z ^ 2025-05-07T20:01:48.6921263Z detected during: 2025-05-07T20:01:48.6935963Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:48.6963666Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:48.6992169Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:48.7009211Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:48.7010344Z 2025-05-07T20:01:48.7010621Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:48.7010979Z 2025-05-07T20:01:48.7011777Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:48.7012924Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:48.7013353Z ^ 2025-05-07T20:01:48.7013596Z detected during: 2025-05-07T20:01:48.7027575Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:48.7055890Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:48.7083672Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:48.7111825Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:48.7127768Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:48.7128968Z 2025-05-07T20:01:48.7129796Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:48.7130990Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:48.7131461Z ^ 2025-05-07T20:01:48.7131734Z detected during: 2025-05-07T20:01:48.7146325Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:48.7173906Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:48.7202279Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:48.7218389Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:48.7219541Z 2025-05-07T20:01:48.7219792Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:48.7220170Z 2025-05-07T20:01:48.7220973Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:48.7222100Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:48.7222509Z ^ 2025-05-07T20:01:48.7222770Z detected during: 2025-05-07T20:01:48.7236636Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:48.7264958Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:48.7292718Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:48.7320866Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:48.7337973Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:48.7339126Z 2025-05-07T20:01:48.7339934Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:48.7341100Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:48.7341552Z ^ 2025-05-07T20:01:48.7341855Z detected during: 2025-05-07T20:01:48.7356461Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:48.7384427Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:48.7412589Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:48.7428571Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:48.7429725Z 2025-05-07T20:01:48.7429977Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:48.7430335Z 2025-05-07T20:01:48.7431156Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:48.7432264Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:48.7432691Z ^ 2025-05-07T20:01:48.7432952Z detected during: 2025-05-07T20:01:48.7446872Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:48.7475230Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:48.7502861Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:48.7531058Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:48.7547044Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:48.7548211Z 2025-05-07T20:01:48.7549016Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:48.7550180Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:48.7550632Z ^ 2025-05-07T20:01:48.7550923Z detected during: 2025-05-07T20:01:48.7572121Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:48.7600313Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:48.7628568Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:48.7644542Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:48.7645683Z 2025-05-07T20:01:48.7645931Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:48.7646323Z 2025-05-07T20:01:48.7647128Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:48.7648260Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:48.7648667Z ^ 2025-05-07T20:01:48.7648912Z detected during: 2025-05-07T20:01:48.7662830Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:48.7692396Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:48.7720292Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:48.7748358Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:48.7764260Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:48.7765428Z 2025-05-07T20:01:48.7766239Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:48.7767409Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:48.7767867Z ^ 2025-05-07T20:01:48.7768211Z detected during: 2025-05-07T20:01:48.7782936Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:48.7811335Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:48.7839753Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:48.7855863Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:48.7857024Z 2025-05-07T20:01:48.7857275Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:48.7857718Z 2025-05-07T20:01:48.7858546Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:48.7859700Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:48.7860141Z ^ 2025-05-07T20:01:48.7860384Z detected during: 2025-05-07T20:01:48.7874418Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:48.7903094Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:48.7930693Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:48.7958846Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:48.7974976Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:01:48.7976179Z 2025-05-07T20:01:50.1442121Z [134/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu.o 2025-05-07T20:01:50.1454736Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:50.1456468Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.1457639Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.1458120Z ^ 2025-05-07T20:01:50.1458292Z 2025-05-07T20:01:50.1458536Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:50.1458922Z 2025-05-07T20:01:50.1459741Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.1460889Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:50.1461361Z ^ 2025-05-07T20:01:50.1461530Z 2025-05-07T20:01:50.1462444Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.1463591Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.1464050Z ^ 2025-05-07T20:01:50.1464316Z detected during: 2025-05-07T20:01:50.1479166Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:50.1506955Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:50.1535034Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:50.1551018Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:50.1552218Z 2025-05-07T20:01:50.1552511Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:50.1552873Z 2025-05-07T20:01:50.1553681Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.1554807Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.1555218Z ^ 2025-05-07T20:01:50.1555481Z detected during: 2025-05-07T20:01:50.1569417Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:50.1597927Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:50.1627352Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:50.1655681Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:50.1671702Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:50.1672839Z 2025-05-07T20:01:50.1673643Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.1674971Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.1675456Z ^ 2025-05-07T20:01:50.1675733Z detected during: 2025-05-07T20:01:50.1690468Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:50.1718319Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:50.1746425Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:50.1762368Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:50.1763516Z 2025-05-07T20:01:50.1763761Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:50.1764140Z 2025-05-07T20:01:50.1764949Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.1766082Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.1766489Z ^ 2025-05-07T20:01:50.1766747Z detected during: 2025-05-07T20:01:50.1780913Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:50.1809364Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:50.1837065Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:50.1865181Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:50.1881231Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:50.1882386Z 2025-05-07T20:01:50.1883194Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.1884359Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.1884807Z ^ 2025-05-07T20:01:50.1885090Z detected during: 2025-05-07T20:01:50.1899777Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:50.1927419Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:50.1956317Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:50.1972226Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:50.1973382Z 2025-05-07T20:01:50.1973628Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:50.1973984Z 2025-05-07T20:01:50.1974955Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.1976159Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.1976591Z ^ 2025-05-07T20:01:50.1976827Z detected during: 2025-05-07T20:01:50.1990859Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:50.2019212Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:50.2046813Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:50.2075188Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:50.2091207Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:50.2092330Z 2025-05-07T20:01:50.2093144Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.2094290Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.2094812Z ^ 2025-05-07T20:01:50.2095108Z detected during: 2025-05-07T20:01:50.2109864Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:50.2137455Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:50.2165470Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:50.2181707Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:50.2182846Z 2025-05-07T20:01:50.2183119Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:50.2183488Z 2025-05-07T20:01:50.2184393Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.2185530Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.2185941Z ^ 2025-05-07T20:01:50.2186200Z detected during: 2025-05-07T20:01:50.2200070Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:50.2228414Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:50.2256118Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:50.2285565Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:50.2301684Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:50.2302891Z 2025-05-07T20:01:50.2303716Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.2304886Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.2305361Z ^ 2025-05-07T20:01:50.2305642Z detected during: 2025-05-07T20:01:50.2320326Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:50.2348048Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:50.2376377Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:50.2392390Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:50.2393555Z 2025-05-07T20:01:50.2393810Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:50.2394190Z 2025-05-07T20:01:50.2394991Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.2396120Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.2396531Z ^ 2025-05-07T20:01:50.2396782Z detected during: 2025-05-07T20:01:50.2410695Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:50.2438957Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:50.2466655Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:50.2494908Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:50.2510964Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:50.2512118Z 2025-05-07T20:01:50.2512920Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.2514083Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.2514532Z ^ 2025-05-07T20:01:50.2514829Z detected during: 2025-05-07T20:01:50.2529314Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:50.2556976Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:50.2585236Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:50.2601229Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:50.2602385Z 2025-05-07T20:01:50.2602630Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:50.2602991Z 2025-05-07T20:01:50.2603823Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.2604944Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.2605380Z ^ 2025-05-07T20:01:50.2605611Z detected during: 2025-05-07T20:01:50.2619524Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:50.2647828Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:50.2675662Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:50.2703965Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:50.2719901Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:01:50.2721027Z 2025-05-07T20:01:52.6716442Z [135/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu.o 2025-05-07T20:01:52.6729013Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:52.6730759Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:52.6731997Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:52.6732452Z ^ 2025-05-07T20:01:52.6732651Z 2025-05-07T20:01:52.6732896Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:52.6733255Z 2025-05-07T20:01:52.6734095Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:52.6735250Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:52.6735813Z ^ 2025-05-07T20:01:52.6735997Z 2025-05-07T20:01:52.6736787Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:52.6737951Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:52.6738424Z ^ 2025-05-07T20:01:52.6738688Z detected during: 2025-05-07T20:01:52.6753431Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:52.6793036Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:52.6821653Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:52.6837552Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:01:52.6838697Z 2025-05-07T20:01:52.6838949Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:52.6839328Z 2025-05-07T20:01:52.6840128Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:52.6841254Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:52.6841661Z ^ 2025-05-07T20:01:52.6841924Z detected during: 2025-05-07T20:01:52.6855862Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:52.6884236Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:52.6913731Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:52.6941971Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:52.6957876Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:01:52.6959040Z 2025-05-07T20:01:52.6959840Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:52.6961012Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:52.6961469Z ^ 2025-05-07T20:01:52.6961766Z detected during: 2025-05-07T20:01:52.6976678Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:52.7004316Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:52.7032478Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:52.7048401Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:01:52.7049556Z 2025-05-07T20:01:52.7049805Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:52.7050169Z 2025-05-07T20:01:52.7050989Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:52.7052161Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:52.7052602Z ^ 2025-05-07T20:01:52.7052845Z detected during: 2025-05-07T20:01:52.7066793Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:52.7095142Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:52.7122925Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:52.7151149Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:52.7167187Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:01:52.7168350Z 2025-05-07T20:01:52.7169181Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:52.7170346Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:52.7170800Z ^ 2025-05-07T20:01:52.7171093Z detected during: 2025-05-07T20:01:52.7185968Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:52.7213625Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:52.7242661Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:52.7258651Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:01:52.7259846Z 2025-05-07T20:01:52.7260143Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:52.7260507Z 2025-05-07T20:01:52.7261314Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:52.7262443Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:52.7262883Z ^ 2025-05-07T20:01:52.7263119Z detected during: 2025-05-07T20:01:52.7277208Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:52.7305418Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:52.7333089Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:52.7361303Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:52.7377458Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:01:52.7378595Z 2025-05-07T20:01:52.7379416Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:52.7380560Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:52.7381029Z ^ 2025-05-07T20:01:52.7381310Z detected during: 2025-05-07T20:01:52.7395992Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:52.7423583Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:52.7451727Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:52.7467781Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:01:52.7468946Z 2025-05-07T20:01:52.7469202Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:52.7469563Z 2025-05-07T20:01:52.7470393Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:52.7471508Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:52.7471935Z ^ 2025-05-07T20:01:52.7472173Z detected during: 2025-05-07T20:01:52.7486148Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:52.7514418Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:52.7542146Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:52.7571222Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:52.7587350Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:01:52.7588496Z 2025-05-07T20:01:52.7589330Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:52.7590485Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:52.7590960Z ^ 2025-05-07T20:01:52.7591230Z detected during: 2025-05-07T20:01:52.7605943Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:52.7633718Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:52.7661770Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:52.7677836Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:01:52.7678975Z 2025-05-07T20:01:52.7679255Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:52.7679617Z 2025-05-07T20:01:52.7680451Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:52.7681635Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:52.7682044Z ^ 2025-05-07T20:01:52.7682299Z detected during: 2025-05-07T20:01:52.7696249Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:52.7724541Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:52.7752224Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:52.7780516Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:52.7796646Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:01:52.7797786Z 2025-05-07T20:01:52.7798582Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:52.7799757Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:52.7800229Z ^ 2025-05-07T20:01:52.7800563Z detected during: 2025-05-07T20:01:52.7815264Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:52.7843089Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:52.7872205Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:52.7888378Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:01:52.7889537Z 2025-05-07T20:01:52.7889787Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:52.7890173Z 2025-05-07T20:01:52.7891116Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:52.7892265Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:52.7892684Z ^ 2025-05-07T20:01:52.7892952Z detected during: 2025-05-07T20:01:52.7906910Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:52.7935237Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:52.7962912Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:52.7991265Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:52.8007222Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:01:52.8008445Z 2025-05-07T20:01:54.7287834Z [136/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu.o 2025-05-07T20:01:54.7309779Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:54.7312499Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:54.7314483Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:54.7315230Z ^ 2025-05-07T20:01:54.7315508Z 2025-05-07T20:01:54.7315935Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:54.7316519Z 2025-05-07T20:01:54.7317899Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:54.7319914Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:54.7320685Z ^ 2025-05-07T20:01:54.7321148Z 2025-05-07T20:01:54.7322631Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:54.7324804Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:54.7325551Z ^ 2025-05-07T20:01:54.7326023Z detected during: 2025-05-07T20:01:54.7352570Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:54.7401457Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:54.7452546Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:54.7479826Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:01:54.7481399Z 2025-05-07T20:01:54.7481728Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:54.7482194Z 2025-05-07T20:01:54.7483282Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:54.7484734Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:54.7485306Z ^ 2025-05-07T20:01:54.7485665Z detected during: 2025-05-07T20:01:54.7510559Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:54.7558353Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:54.7609235Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:54.7663469Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:54.7692227Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:01:54.7694439Z 2025-05-07T20:01:54.7695897Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:54.7698011Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:54.7698809Z ^ 2025-05-07T20:01:54.7699295Z detected during: 2025-05-07T20:01:54.7724834Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:54.7775106Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:54.7824483Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:54.7852947Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:01:54.7855079Z 2025-05-07T20:01:54.7855566Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:54.7856144Z 2025-05-07T20:01:54.7857440Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:54.7859347Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:54.7860033Z ^ 2025-05-07T20:01:54.7860420Z detected during: 2025-05-07T20:01:54.7886313Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:54.7938844Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:54.7990470Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:54.8032424Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:54.8062258Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:01:54.8064321Z 2025-05-07T20:01:54.8065765Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:54.8067898Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:54.8068709Z ^ 2025-05-07T20:01:54.8069102Z detected during: 2025-05-07T20:01:54.8095878Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:54.8146402Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:54.8198706Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:54.8227585Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:01:54.8229731Z 2025-05-07T20:01:54.8230169Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:54.8230821Z 2025-05-07T20:01:54.8232173Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:54.8234124Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:54.8234858Z ^ 2025-05-07T20:01:54.8235274Z detected during: 2025-05-07T20:01:54.8259844Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:54.8312683Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:54.8364238Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:54.8416902Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:54.8446605Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:01:54.8448739Z 2025-05-07T20:01:54.8450176Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:54.8452263Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:54.8453031Z ^ 2025-05-07T20:01:54.8453488Z detected during: 2025-05-07T20:01:54.8480481Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:54.8531256Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:54.8581352Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:54.8610332Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:01:54.8612487Z 2025-05-07T20:01:54.8612951Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:54.8613618Z 2025-05-07T20:01:54.8615160Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:54.8617180Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:54.8617947Z ^ 2025-05-07T20:01:54.8618357Z detected during: 2025-05-07T20:01:54.8643959Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:54.8695027Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:54.8745272Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:54.8796266Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:54.8824347Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:01:54.8826388Z 2025-05-07T20:01:54.8827847Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:54.8829875Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:54.8830612Z ^ 2025-05-07T20:01:54.8831059Z detected during: 2025-05-07T20:01:54.8856978Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:54.8893401Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:54.8921555Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:54.8937541Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:01:54.8938698Z 2025-05-07T20:01:54.8938945Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:54.8939307Z 2025-05-07T20:01:54.8940139Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:54.8941255Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:54.8941758Z ^ 2025-05-07T20:01:54.8941994Z detected during: 2025-05-07T20:01:54.8955870Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:54.8984216Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:54.9011687Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:54.9039735Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:54.9055625Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:01:54.9056777Z 2025-05-07T20:01:54.9057579Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:54.9058783Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:54.9059251Z ^ 2025-05-07T20:01:54.9059542Z detected during: 2025-05-07T20:01:54.9074065Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:54.9101812Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:54.9129771Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:54.9145651Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:01:54.9146773Z 2025-05-07T20:01:54.9147045Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:54.9148279Z 2025-05-07T20:01:54.9149093Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:54.9150270Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:54.9150705Z ^ 2025-05-07T20:01:54.9150943Z detected during: 2025-05-07T20:01:54.9164791Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:54.9193193Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:54.9220677Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:54.9248657Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:54.9264563Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:01:54.9265692Z 2025-05-07T20:01:58.6646735Z [137/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu.o 2025-05-07T20:01:58.6659500Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:58.6661092Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:58.6662236Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:58.6662710Z ^ 2025-05-07T20:01:58.6662884Z 2025-05-07T20:01:58.6663148Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:58.6663506Z 2025-05-07T20:01:58.6664327Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:58.6665502Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:58.6665941Z ^ 2025-05-07T20:01:58.6666135Z 2025-05-07T20:01:58.6667017Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:58.6668185Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:58.6668626Z ^ 2025-05-07T20:01:58.6668922Z detected during: 2025-05-07T20:01:58.6683591Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:58.6711171Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:58.6739094Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:58.6754971Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:01:58.6756208Z 2025-05-07T20:01:58.6756457Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:58.6756819Z 2025-05-07T20:01:58.6757668Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:58.6758788Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:58.6759223Z ^ 2025-05-07T20:01:58.6759463Z detected during: 2025-05-07T20:01:58.6773296Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:58.6801594Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:58.6829120Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:58.6858510Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:58.6874418Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:01:58.6875686Z 2025-05-07T20:01:58.6876519Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:58.6877666Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:58.6878145Z ^ 2025-05-07T20:01:58.6878423Z detected during: 2025-05-07T20:01:58.6893025Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:58.6920491Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:58.6948483Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:58.6964403Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:01:58.6965540Z 2025-05-07T20:01:58.6965804Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:58.6966165Z 2025-05-07T20:01:58.6966966Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:58.6968098Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:58.6968517Z ^ 2025-05-07T20:01:58.6968776Z detected during: 2025-05-07T20:01:58.6982817Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:58.7010772Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:58.7038133Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:58.7066023Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:58.7081955Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:01:58.7083089Z 2025-05-07T20:01:58.7083894Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:58.7085066Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:58.7085545Z ^ 2025-05-07T20:01:58.7085813Z detected during: 2025-05-07T20:01:58.7100327Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:58.7127717Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:58.7155615Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:58.7172396Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:01:58.7173533Z 2025-05-07T20:01:58.7173780Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:58.7174164Z 2025-05-07T20:01:58.7175113Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:58.7176329Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:58.7176743Z ^ 2025-05-07T20:01:58.7177035Z detected during: 2025-05-07T20:01:58.7190810Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:58.7218833Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:58.7246214Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:58.7274040Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:58.7289940Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:01:58.7291104Z 2025-05-07T20:01:58.7291912Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:58.7293077Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:58.7293529Z ^ 2025-05-07T20:01:58.7293826Z detected during: 2025-05-07T20:01:58.7308303Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:58.7335844Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:58.7363834Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:58.7379816Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:01:58.7380978Z 2025-05-07T20:01:58.7381226Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:58.7381583Z 2025-05-07T20:01:58.7382413Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:58.7383524Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:58.7383950Z ^ 2025-05-07T20:01:58.7384187Z detected during: 2025-05-07T20:01:58.7398079Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:58.7426059Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:58.7453413Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:58.7481488Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:58.7498144Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:01:58.7499276Z 2025-05-07T20:01:58.7500098Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:58.7501236Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:58.7501709Z ^ 2025-05-07T20:01:58.7501978Z detected during: 2025-05-07T20:01:58.7516537Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:58.7543946Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:58.7571816Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:58.7587823Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:01:58.7588968Z 2025-05-07T20:01:58.7589234Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:58.7589589Z 2025-05-07T20:01:58.7590386Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:58.7591567Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:58.7591981Z ^ 2025-05-07T20:01:58.7592233Z detected during: 2025-05-07T20:01:58.7605981Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:58.7634066Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:58.7661437Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:58.7689368Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:58.7705293Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:01:58.7706428Z 2025-05-07T20:01:58.7707227Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:58.7708440Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:58.7708904Z ^ 2025-05-07T20:01:58.7709167Z detected during: 2025-05-07T20:01:58.7723598Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:58.7751171Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:58.7779357Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:58.7795188Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:01:58.7796346Z 2025-05-07T20:01:58.7796620Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:58.7797002Z 2025-05-07T20:01:58.7797809Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:58.7798914Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:58.7799341Z ^ 2025-05-07T20:01:58.7799594Z detected during: 2025-05-07T20:01:58.7814225Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:58.7842573Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:58.7870150Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:58.7898347Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:58.7914308Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:01:58.7915557Z 2025-05-07T20:02:05.8041166Z [138/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu.o 2025-05-07T20:02:05.8053686Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:05.8055285Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.8056557Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.8057042Z ^ 2025-05-07T20:02:05.8057218Z 2025-05-07T20:02:05.8057490Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:05.8057850Z 2025-05-07T20:02:05.8058668Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.8059856Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:05.8060489Z ^ 2025-05-07T20:02:05.8060663Z 2025-05-07T20:02:05.8061487Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.8062643Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.8063082Z ^ 2025-05-07T20:02:05.8063372Z detected during: 2025-05-07T20:02:05.8078350Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.8106721Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.8135553Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.8151775Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:02:05.8152933Z 2025-05-07T20:02:05.8153177Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:05.8153536Z 2025-05-07T20:02:05.8154356Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.8155466Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.8155901Z ^ 2025-05-07T20:02:05.8156137Z detected during: 2025-05-07T20:02:05.8170084Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:05.8198706Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.8226929Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.8257558Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.8273798Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:02:05.8275099Z 2025-05-07T20:02:05.8275930Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.8277075Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.8277550Z ^ 2025-05-07T20:02:05.8277844Z detected during: 2025-05-07T20:02:05.8292652Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.8320875Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.8349477Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.8365593Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:02:05.8366759Z 2025-05-07T20:02:05.8367008Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:05.8367371Z 2025-05-07T20:02:05.8368197Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.8369302Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.8369806Z ^ 2025-05-07T20:02:05.8370039Z detected during: 2025-05-07T20:02:05.8384082Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:05.8412635Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.8440926Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.8469536Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.8485840Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:02:05.8487010Z 2025-05-07T20:02:05.8487901Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.8489079Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.8489529Z ^ 2025-05-07T20:02:05.8489817Z detected during: 2025-05-07T20:02:05.8504666Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.8532787Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.8561330Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.8579086Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:02:05.8580245Z 2025-05-07T20:02:05.8580495Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:05.8580856Z 2025-05-07T20:02:05.8581683Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.8582797Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.8583236Z ^ 2025-05-07T20:02:05.8583466Z detected during: 2025-05-07T20:02:05.8597438Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:05.8625862Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.8653910Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.8682919Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.8699868Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:02:05.8701020Z 2025-05-07T20:02:05.8701846Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.8703000Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.8703468Z ^ 2025-05-07T20:02:05.8703756Z detected during: 2025-05-07T20:02:05.8718626Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.8747082Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.8775977Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.8792128Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:02:05.8793253Z 2025-05-07T20:02:05.8793519Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:05.8793876Z 2025-05-07T20:02:05.8794676Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.8795798Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.8796231Z ^ 2025-05-07T20:02:05.8796462Z detected during: 2025-05-07T20:02:05.8810285Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:05.8838707Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.8866817Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.8895556Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.8912698Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:02:05.8913901Z 2025-05-07T20:02:05.8914702Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.8915889Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.8916362Z ^ 2025-05-07T20:02:05.8916639Z detected during: 2025-05-07T20:02:05.8931465Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.8972988Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.9002009Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.9018243Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:02:05.9019401Z 2025-05-07T20:02:05.9019649Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:05.9020029Z 2025-05-07T20:02:05.9020839Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.9021965Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.9022389Z ^ 2025-05-07T20:02:05.9022656Z detected during: 2025-05-07T20:02:05.9036470Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:05.9064885Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.9093080Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.9121663Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.9137889Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:02:05.9139058Z 2025-05-07T20:02:05.9139862Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.9141035Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.9141490Z ^ 2025-05-07T20:02:05.9141796Z detected during: 2025-05-07T20:02:05.9156700Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.9185023Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.9213541Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.9229811Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:02:05.9230975Z 2025-05-07T20:02:05.9231220Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:05.9231586Z 2025-05-07T20:02:05.9232412Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.9233526Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.9234786Z ^ 2025-05-07T20:02:05.9235027Z detected during: 2025-05-07T20:02:05.9249098Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:05.9277724Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.9305836Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.9334440Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.9350790Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:02:05.9351939Z 2025-05-07T20:02:05.9363506Z [139/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu.o 2025-05-07T20:02:05.9375941Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:05.9377534Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.9378707Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.9379166Z ^ 2025-05-07T20:02:05.9379365Z 2025-05-07T20:02:05.9379613Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:05.9379971Z 2025-05-07T20:02:05.9380821Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.9381983Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:05.9382439Z ^ 2025-05-07T20:02:05.9382610Z 2025-05-07T20:02:05.9383457Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.9384618Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.9385088Z ^ 2025-05-07T20:02:05.9385356Z detected during: 2025-05-07T20:02:05.9400094Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.9428345Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.9456842Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.9473113Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:02:05.9474257Z 2025-05-07T20:02:05.9474504Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:05.9475021Z 2025-05-07T20:02:05.9475825Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.9476965Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.9477381Z ^ 2025-05-07T20:02:05.9477641Z detected during: 2025-05-07T20:02:05.9491493Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:05.9519881Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.9548092Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.9577884Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.9594077Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:02:05.9595228Z 2025-05-07T20:02:05.9596026Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.9597193Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.9597650Z ^ 2025-05-07T20:02:05.9597944Z detected during: 2025-05-07T20:02:05.9612707Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.9640842Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.9669457Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.9685861Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:02:05.9687014Z 2025-05-07T20:02:05.9687263Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:05.9687622Z 2025-05-07T20:02:05.9688445Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.9689554Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.9689987Z ^ 2025-05-07T20:02:05.9690232Z detected during: 2025-05-07T20:02:05.9704191Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:05.9732680Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.9760937Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.9789763Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.9806021Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:02:05.9807182Z 2025-05-07T20:02:05.9808048Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.9809198Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.9809674Z ^ 2025-05-07T20:02:05.9809962Z detected during: 2025-05-07T20:02:05.9824745Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.9852854Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.9882489Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.9902150Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:02:05.9903312Z 2025-05-07T20:02:05.9903583Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:05.9903948Z 2025-05-07T20:02:05.9904754Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.9905894Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.9906322Z ^ 2025-05-07T20:02:05.9906555Z detected during: 2025-05-07T20:02:05.9920405Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:05.9948931Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.9977376Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:06.0006015Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:06.0022350Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:02:06.0023489Z 2025-05-07T20:02:06.0024296Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:06.0025467Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:06.0025933Z ^ 2025-05-07T20:02:06.0026198Z detected during: 2025-05-07T20:02:06.0040919Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:06.0069162Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:06.0097903Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:06.0114186Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:02:06.0115334Z 2025-05-07T20:02:06.0115588Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:06.0115974Z 2025-05-07T20:02:06.0116771Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:06.0117918Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:06.0118325Z ^ 2025-05-07T20:02:06.0118586Z detected during: 2025-05-07T20:02:06.0132427Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:06.0160856Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:06.0189191Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:06.0218493Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:06.0234662Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:02:06.0235822Z 2025-05-07T20:02:06.0236647Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:06.0237841Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:06.0238288Z ^ 2025-05-07T20:02:06.0238575Z detected during: 2025-05-07T20:02:06.0253378Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:06.0281630Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:06.0310217Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:06.0326548Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:02:06.0327698Z 2025-05-07T20:02:06.0327946Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:06.0328305Z 2025-05-07T20:02:06.0329123Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:06.0330228Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:06.0330652Z ^ 2025-05-07T20:02:06.0330887Z detected during: 2025-05-07T20:02:06.0344719Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:06.0373274Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:06.0401588Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:06.0430137Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:06.0446326Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:02:06.0447483Z 2025-05-07T20:02:06.0448282Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:06.0449425Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:06.0449891Z ^ 2025-05-07T20:02:06.0450177Z detected during: 2025-05-07T20:02:06.0464968Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:06.0493167Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:06.0522578Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:06.0538809Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:02:06.0539943Z 2025-05-07T20:02:06.0540212Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:06.0540571Z 2025-05-07T20:02:06.0541367Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:06.0542490Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:06.0542914Z ^ 2025-05-07T20:02:06.0543155Z detected during: 2025-05-07T20:02:06.0557118Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:06.0585874Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:06.0613840Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:06.0642569Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:06.0658767Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:02:06.0659945Z 2025-05-07T20:02:10.0685589Z [140/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu.o 2025-05-07T20:02:10.0698171Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:10.0699739Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.0700890Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.0701328Z ^ 2025-05-07T20:02:10.0701503Z 2025-05-07T20:02:10.0701740Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.0702092Z 2025-05-07T20:02:10.0702921Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.0704059Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:10.0704496Z ^ 2025-05-07T20:02:10.0704662Z 2025-05-07T20:02:10.0705448Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.0706666Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.0707148Z ^ 2025-05-07T20:02:10.0707399Z detected during: 2025-05-07T20:02:10.0721907Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.0749585Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.0777696Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.0793656Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:10.0794792Z 2025-05-07T20:02:10.0795073Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.0795474Z 2025-05-07T20:02:10.0796274Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.0797395Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.0797806Z ^ 2025-05-07T20:02:10.0798054Z detected during: 2025-05-07T20:02:10.0811793Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:10.0839762Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.0868084Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.0896193Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.0911997Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:10.0913191Z 2025-05-07T20:02:10.0914020Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.0915181Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.0915631Z ^ 2025-05-07T20:02:10.0915925Z detected during: 2025-05-07T20:02:10.0930319Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.0957614Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.0985479Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.1001269Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:10.1002446Z 2025-05-07T20:02:10.1002697Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.1003055Z 2025-05-07T20:02:10.1003876Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.1004989Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.1005421Z ^ 2025-05-07T20:02:10.1005654Z detected during: 2025-05-07T20:02:10.1019586Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:10.1047457Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.1075006Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.1103025Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.1118898Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:10.1120028Z 2025-05-07T20:02:10.1120857Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.1122003Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.1122471Z ^ 2025-05-07T20:02:10.1122747Z detected during: 2025-05-07T20:02:10.1137222Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.1164592Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.1193552Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.1209518Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:10.1210657Z 2025-05-07T20:02:10.1210928Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.1211290Z 2025-05-07T20:02:10.1212091Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.1213224Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.1213636Z ^ 2025-05-07T20:02:10.1213897Z detected during: 2025-05-07T20:02:10.1227662Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:10.1255672Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.1283217Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.1311024Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.1326870Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:10.1328008Z 2025-05-07T20:02:10.1329234Z ptxas /tmp/tmpxft_00008c96_00000000-9_f4f4bf16_128_256_2_1_1_f.compute_90.ptx, line 835; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:10.1331763Z ptxas /tmp/tmpxft_00008c96_00000000-9_f4f4bf16_128_256_2_1_1_f.compute_90.ptx, line 848; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:10.1334287Z ptxas /tmp/tmpxft_00008c96_00000000-9_f4f4bf16_128_256_2_1_1_f.compute_90.ptx, line 988; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:10.1336881Z ptxas /tmp/tmpxft_00008c96_00000000-9_f4f4bf16_128_256_2_1_1_f.compute_90.ptx, line 1001; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:10.1339013Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.1340173Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.1340642Z ^ 2025-05-07T20:02:10.1340909Z detected during: 2025-05-07T20:02:10.1355406Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.1382922Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.1410862Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.1426718Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:10.1427923Z 2025-05-07T20:02:10.1428206Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.1428593Z 2025-05-07T20:02:10.1429396Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.1430529Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.1430933Z ^ 2025-05-07T20:02:10.1431188Z detected during: 2025-05-07T20:02:10.1444927Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:10.1472908Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.1500467Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.1529303Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.1545217Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:10.1546377Z 2025-05-07T20:02:10.1547183Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.1548346Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.1548796Z ^ 2025-05-07T20:02:10.1549085Z detected during: 2025-05-07T20:02:10.1563462Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.1591105Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.1618893Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.1634743Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:10.1635900Z 2025-05-07T20:02:10.1636149Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.1636508Z 2025-05-07T20:02:10.1637327Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.1638430Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.1638862Z ^ 2025-05-07T20:02:10.1639096Z detected during: 2025-05-07T20:02:10.1652900Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:10.1681013Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.1708498Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.1736428Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.1752371Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:10.1753502Z 2025-05-07T20:02:10.1754322Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.1755467Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.1755954Z ^ 2025-05-07T20:02:10.1756253Z detected during: 2025-05-07T20:02:10.1770658Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.1798183Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.1825999Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.1842625Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:10.1843758Z 2025-05-07T20:02:10.1844020Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.1844375Z 2025-05-07T20:02:10.1845171Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.1846304Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.1846731Z ^ 2025-05-07T20:02:10.1846970Z detected during: 2025-05-07T20:02:10.1860852Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:10.1889020Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.1916498Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.1944447Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.1960253Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:10.1961390Z 2025-05-07T20:02:10.1972564Z [141/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu.o 2025-05-07T20:02:10.1984714Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:10.1986260Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.1987484Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.1987989Z ^ 2025-05-07T20:02:10.1988162Z 2025-05-07T20:02:10.1988405Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.1988782Z 2025-05-07T20:02:10.1989600Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.1990757Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:10.1991216Z ^ 2025-05-07T20:02:10.1991388Z 2025-05-07T20:02:10.5795252Z [142/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu.o 2025-05-07T20:02:10.5807789Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:10.5809353Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.5810523Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.5810980Z ^ 2025-05-07T20:02:10.5811273Z 2025-05-07T20:02:10.5811603Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.5811962Z 2025-05-07T20:02:10.5812811Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.5813964Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:10.5814433Z ^ 2025-05-07T20:02:10.5814604Z 2025-05-07T20:02:10.5815492Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.5816633Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.5817101Z ^ 2025-05-07T20:02:10.5817378Z detected during: 2025-05-07T20:02:10.5831940Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.5859348Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.5887352Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.5903366Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:10.5904523Z 2025-05-07T20:02:10.5904777Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.5905136Z 2025-05-07T20:02:10.5905957Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.5907071Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.5907503Z ^ 2025-05-07T20:02:10.5907733Z detected during: 2025-05-07T20:02:10.5921497Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:10.5949569Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.5978428Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.6006366Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.6022327Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:10.6023451Z 2025-05-07T20:02:10.6024286Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.6025434Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.6025902Z ^ 2025-05-07T20:02:10.6026172Z detected during: 2025-05-07T20:02:10.6040679Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.6068122Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.6096214Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.6112184Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:10.6113308Z 2025-05-07T20:02:10.6113578Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.6113946Z 2025-05-07T20:02:10.6114747Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.6115876Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.6116293Z ^ 2025-05-07T20:02:10.6116560Z detected during: 2025-05-07T20:02:10.6130443Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:10.6158275Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.6185877Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.6213845Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.6229715Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:10.6230844Z 2025-05-07T20:02:10.6231652Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.6232814Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.6233296Z ^ 2025-05-07T20:02:10.6233572Z detected during: 2025-05-07T20:02:10.6248107Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.6275657Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.6304426Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.6320345Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:10.6321503Z 2025-05-07T20:02:10.6321755Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.6322133Z 2025-05-07T20:02:10.6322931Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.6324037Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.6324463Z ^ 2025-05-07T20:02:10.6324729Z detected during: 2025-05-07T20:02:10.6338490Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:10.6366328Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.6394023Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.6421853Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.6437600Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:10.6438741Z 2025-05-07T20:02:10.6439970Z ptxas /tmp/tmpxft_00008c99_00000000-9_f4f4bf16_128_256_2_1_1_t.compute_90.ptx, line 835; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:10.6442500Z ptxas /tmp/tmpxft_00008c99_00000000-9_f4f4bf16_128_256_2_1_1_t.compute_90.ptx, line 848; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:10.6445084Z ptxas /tmp/tmpxft_00008c99_00000000-9_f4f4bf16_128_256_2_1_1_t.compute_90.ptx, line 988; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:10.6447588Z ptxas /tmp/tmpxft_00008c99_00000000-9_f4f4bf16_128_256_2_1_1_t.compute_90.ptx, line 1001; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:10.6449711Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.6450878Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.6451328Z ^ 2025-05-07T20:02:10.6451625Z detected during: 2025-05-07T20:02:10.6466017Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.6493527Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.6521391Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.6537308Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:10.6538455Z 2025-05-07T20:02:10.6538708Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.6539067Z 2025-05-07T20:02:10.6539889Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.6540987Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.6541407Z ^ 2025-05-07T20:02:10.6541639Z detected during: 2025-05-07T20:02:10.6555447Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:10.6583381Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.6611557Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.6639442Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.6655327Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:10.6656472Z 2025-05-07T20:02:10.6657298Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.6658429Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.6658895Z ^ 2025-05-07T20:02:10.6659193Z detected during: 2025-05-07T20:02:10.6673546Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.6701174Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.6729026Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.6744860Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:10.6745987Z 2025-05-07T20:02:10.6746259Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.6746616Z 2025-05-07T20:02:10.6747418Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.6748541Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.6748967Z ^ 2025-05-07T20:02:10.6749194Z detected during: 2025-05-07T20:02:10.6762979Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:10.6791310Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.6818773Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.6846556Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.6862395Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:10.6863527Z 2025-05-07T20:02:10.6864319Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.6865481Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.6865960Z ^ 2025-05-07T20:02:10.6866231Z detected during: 2025-05-07T20:02:10.6880861Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.6908265Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.6936819Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.6952605Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:10.6953735Z 2025-05-07T20:02:10.6953988Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:10.6954372Z 2025-05-07T20:02:10.6955177Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:10.6956311Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:10.6956720Z ^ 2025-05-07T20:02:10.6957007Z detected during: 2025-05-07T20:02:10.6970766Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:10.6998864Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:10.7026172Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:10.7053915Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:10.7069761Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:10.7070904Z 2025-05-07T20:02:12.8891030Z [143/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu.o 2025-05-07T20:02:12.8903804Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:12.8905368Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:12.8906538Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:12.8907011Z ^ 2025-05-07T20:02:12.8907192Z 2025-05-07T20:02:12.8907438Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:12.8907809Z 2025-05-07T20:02:12.8908647Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:12.8909789Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:12.8910248Z ^ 2025-05-07T20:02:12.8910418Z 2025-05-07T20:02:12.8911233Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:12.8912368Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:12.8912833Z ^ 2025-05-07T20:02:12.8913093Z detected during: 2025-05-07T20:02:12.8929261Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:12.8957422Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:12.8986079Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:12.9002294Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:02:12.9003427Z 2025-05-07T20:02:12.9003695Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:12.9004097Z 2025-05-07T20:02:12.9004905Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:12.9006087Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:12.9006535Z ^ 2025-05-07T20:02:12.9006824Z detected during: 2025-05-07T20:02:12.9021594Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:12.9049726Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:12.9078393Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:12.9094724Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:02:12.9095932Z 2025-05-07T20:02:12.9096178Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:12.9096544Z 2025-05-07T20:02:12.9097362Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:12.9098499Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:12.9098963Z ^ 2025-05-07T20:02:12.9099253Z detected during: 2025-05-07T20:02:12.9114098Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:12.9142140Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:12.9170603Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:12.9186933Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:02:12.9188073Z 2025-05-07T20:02:12.9188344Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:12.9188712Z 2025-05-07T20:02:12.9189935Z ptxas /tmp/tmpxft_00008c8d_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 889; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:12.9192467Z ptxas /tmp/tmpxft_00008c8d_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 896; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:12.9195091Z ptxas /tmp/tmpxft_00008c8d_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 903; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:12.9197666Z ptxas /tmp/tmpxft_00008c8d_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 910; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:12.9200167Z ptxas /tmp/tmpxft_00008c8d_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 1044; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:12.9202710Z ptxas /tmp/tmpxft_00008c8d_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 1051; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:12.9205225Z ptxas /tmp/tmpxft_00008c8d_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 1058; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:12.9207755Z ptxas /tmp/tmpxft_00008c8d_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 1065; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:12.9209973Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:12.9211110Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:12.9211577Z ^ 2025-05-07T20:02:12.9211846Z detected during: 2025-05-07T20:02:12.9226592Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:12.9255338Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:12.9294167Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:12.9310586Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:02:12.9311747Z 2025-05-07T20:02:12.9311999Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:12.9312369Z 2025-05-07T20:02:12.9313204Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:12.9314365Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:12.9314835Z ^ 2025-05-07T20:02:12.9315110Z detected during: 2025-05-07T20:02:12.9329868Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:12.9357897Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:12.9386682Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:12.9402834Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:02:12.9403966Z 2025-05-07T20:02:12.9404212Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:12.9404593Z 2025-05-07T20:02:12.9405387Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:12.9406546Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:12.9406994Z ^ 2025-05-07T20:02:12.9407287Z detected during: 2025-05-07T20:02:12.9422081Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:12.9450274Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:12.9478866Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:12.9494972Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:02:12.9496175Z 2025-05-07T20:02:12.9496429Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:12.9496788Z 2025-05-07T20:02:14.0693848Z [144/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu.o 2025-05-07T20:02:14.0706550Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:14.0708113Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.0709291Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:14.0709774Z ^ 2025-05-07T20:02:14.0709953Z 2025-05-07T20:02:14.0710200Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:14.0710579Z 2025-05-07T20:02:14.0711406Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.0712580Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:14.0713029Z ^ 2025-05-07T20:02:14.0713201Z 2025-05-07T20:02:14.0714026Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.0715160Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:14.0715689Z ^ 2025-05-07T20:02:14.0715954Z detected during: 2025-05-07T20:02:14.0730951Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:14.0759322Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:14.0788101Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:14.0804291Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:02:14.0805462Z 2025-05-07T20:02:14.0805737Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:14.0806101Z 2025-05-07T20:02:14.0806908Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.0808067Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:14.0808514Z ^ 2025-05-07T20:02:14.0808808Z detected during: 2025-05-07T20:02:14.0823793Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:14.0851791Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:14.0880529Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:14.0897013Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:02:14.0898174Z 2025-05-07T20:02:14.0898424Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:14.0898814Z 2025-05-07T20:02:14.0899624Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.0900764Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:14.0901249Z ^ 2025-05-07T20:02:14.0901549Z detected during: 2025-05-07T20:02:14.0916377Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:14.0944401Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:14.0973000Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:14.0989519Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:02:14.0990649Z 2025-05-07T20:02:14.0990917Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:14.0991281Z 2025-05-07T20:02:14.0992084Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.0993245Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:14.0993711Z ^ 2025-05-07T20:02:14.0993974Z detected during: 2025-05-07T20:02:14.1010126Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:14.1038339Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:14.1066828Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:14.1083134Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:02:14.1084270Z 2025-05-07T20:02:14.1084544Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:14.1084910Z 2025-05-07T20:02:14.1085800Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.1086963Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:14.1087428Z ^ 2025-05-07T20:02:14.1087703Z detected during: 2025-05-07T20:02:14.1102688Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:14.1130876Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:14.1159310Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:14.1176030Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:02:14.1177261Z 2025-05-07T20:02:14.1177520Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:14.1178046Z 2025-05-07T20:02:14.1179262Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.1180447Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:14.1180901Z ^ 2025-05-07T20:02:14.1181204Z detected during: 2025-05-07T20:02:14.1195967Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:14.1224051Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:14.1252634Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:14.1268855Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:02:14.1270007Z 2025-05-07T20:02:14.1270261Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:14.1270623Z 2025-05-07T20:02:14.1282281Z [145/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu.o 2025-05-07T20:02:14.1294535Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:14.1296516Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.1297680Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:14.1298153Z ^ 2025-05-07T20:02:14.1298328Z 2025-05-07T20:02:14.1298577Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:14.1298953Z 2025-05-07T20:02:14.1299769Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.1300938Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:14.1301380Z ^ 2025-05-07T20:02:14.1301571Z 2025-05-07T20:02:14.1302369Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.1303521Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:14.1303957Z ^ 2025-05-07T20:02:14.1304235Z detected during: 2025-05-07T20:02:14.1319019Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:14.1348097Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:14.1377192Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:14.1393344Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:02:14.1394501Z 2025-05-07T20:02:14.1394752Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:14.1395132Z 2025-05-07T20:02:14.1395932Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.1397074Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:14.1397540Z ^ 2025-05-07T20:02:14.1397895Z detected during: 2025-05-07T20:02:14.1412655Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:14.1440619Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:14.1469131Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:14.1485390Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:02:14.1486598Z 2025-05-07T20:02:14.1486866Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:14.1487222Z 2025-05-07T20:02:14.1488067Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.1489225Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:14.1489693Z ^ 2025-05-07T20:02:14.1489960Z detected during: 2025-05-07T20:02:14.1504963Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:14.1533011Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:14.1561687Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:14.1578216Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:02:14.1579355Z 2025-05-07T20:02:14.1579605Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:14.1579994Z 2025-05-07T20:02:14.1580810Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.1581983Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:14.1582424Z ^ 2025-05-07T20:02:14.1582714Z detected during: 2025-05-07T20:02:14.1597605Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:14.1625658Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:14.1654790Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:14.1671017Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:02:14.1672186Z 2025-05-07T20:02:14.1672438Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:14.1672796Z 2025-05-07T20:02:14.1673594Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.1674863Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:14.1675339Z ^ 2025-05-07T20:02:14.1675612Z detected during: 2025-05-07T20:02:14.1690564Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:14.1718948Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:14.1747470Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:14.1763601Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:02:14.1764727Z 2025-05-07T20:02:14.1764978Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:14.1765359Z 2025-05-07T20:02:14.1766199Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.1767388Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:14.1767836Z ^ 2025-05-07T20:02:14.1768130Z detected during: 2025-05-07T20:02:14.1783349Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:14.1811279Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:14.1839693Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:14.1855887Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:02:14.1857043Z 2025-05-07T20:02:14.1857292Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:14.1857654Z 2025-05-07T20:02:15.0030970Z [146/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu.o 2025-05-07T20:02:15.0043460Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:15.0045041Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:15.0046202Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:15.0046652Z ^ 2025-05-07T20:02:15.0046846Z 2025-05-07T20:02:15.0047092Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:15.0047455Z 2025-05-07T20:02:15.0048362Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:15.0049583Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:15.0050051Z ^ 2025-05-07T20:02:15.0050228Z 2025-05-07T20:02:15.0051019Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:15.0052171Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:15.0052640Z ^ 2025-05-07T20:02:15.0052906Z detected during: 2025-05-07T20:02:15.0067820Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:15.0096173Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:15.0126029Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:15.0142364Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:02:15.0143505Z 2025-05-07T20:02:15.0143756Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:15.0144146Z 2025-05-07T20:02:15.0144953Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:15.0146112Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:15.0146556Z ^ 2025-05-07T20:02:15.0146853Z detected during: 2025-05-07T20:02:15.0161582Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:15.0189996Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:15.0218588Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:15.0234778Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:02:15.0235944Z 2025-05-07T20:02:15.0236191Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:15.0236550Z 2025-05-07T20:02:15.0237374Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:15.0238517Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:15.0239021Z ^ 2025-05-07T20:02:15.0239292Z detected during: 2025-05-07T20:02:15.0254250Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:15.0282457Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:15.0311116Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:15.0327294Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:02:15.0328448Z 2025-05-07T20:02:15.0328695Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:15.0329074Z 2025-05-07T20:02:15.0329877Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:15.0331031Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:15.0331473Z ^ 2025-05-07T20:02:15.0331761Z detected during: 2025-05-07T20:02:15.0346599Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:15.0374920Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:15.0403587Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:15.0419805Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:02:15.0420961Z 2025-05-07T20:02:15.0421212Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:15.0421572Z 2025-05-07T20:02:15.0422405Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:15.0423560Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:15.0424030Z ^ 2025-05-07T20:02:15.0424300Z detected during: 2025-05-07T20:02:15.0439545Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:15.0467792Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:15.0496544Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:15.0512839Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:02:15.0513966Z 2025-05-07T20:02:15.0514241Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:15.0514599Z 2025-05-07T20:02:15.0515395Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:15.0516549Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:15.0516995Z ^ 2025-05-07T20:02:15.0517287Z detected during: 2025-05-07T20:02:15.0532216Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:15.0560433Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:15.0589140Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:15.0605407Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:02:15.0606531Z 2025-05-07T20:02:15.0606776Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:15.0607217Z 2025-05-07T20:02:17.9453210Z [147/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu.o 2025-05-07T20:02:17.9465876Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:17.9467466Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:17.9468622Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:17.9469089Z ^ 2025-05-07T20:02:17.9469260Z 2025-05-07T20:02:17.9469502Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:17.9469883Z 2025-05-07T20:02:17.9470702Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:17.9471872Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:17.9472308Z ^ 2025-05-07T20:02:17.9472493Z 2025-05-07T20:02:17.9473362Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:17.9474587Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:17.9475187Z ^ 2025-05-07T20:02:17.9475545Z detected during: 2025-05-07T20:02:17.9490406Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:17.9518689Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:17.9547234Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:17.9563486Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:02:17.9564650Z 2025-05-07T20:02:17.9564896Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:17.9565256Z 2025-05-07T20:02:17.9566056Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:17.9567227Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:17.9567705Z ^ 2025-05-07T20:02:17.9567981Z detected during: 2025-05-07T20:02:17.9587204Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:17.9615792Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:17.9644398Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:17.9660664Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:02:17.9661822Z 2025-05-07T20:02:17.9662075Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:17.9662441Z 2025-05-07T20:02:17.9663272Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:17.9664411Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:17.9664882Z ^ 2025-05-07T20:02:17.9665158Z detected during: 2025-05-07T20:02:17.9680093Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:17.9708305Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:17.9736842Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:17.9753034Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:02:17.9754168Z 2025-05-07T20:02:17.9754436Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:17.9754795Z 2025-05-07T20:02:17.9755595Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:17.9756754Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:17.9757225Z ^ 2025-05-07T20:02:17.9757515Z detected during: 2025-05-07T20:02:17.9772325Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:17.9800600Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:17.9829147Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:17.9845322Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:02:17.9846511Z 2025-05-07T20:02:17.9846784Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:17.9847147Z 2025-05-07T20:02:17.9847970Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:17.9849106Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:17.9849569Z ^ 2025-05-07T20:02:17.9849840Z detected during: 2025-05-07T20:02:17.9864686Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:17.9893112Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:17.9922458Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:17.9938730Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:02:17.9939857Z 2025-05-07T20:02:17.9940123Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:17.9940491Z 2025-05-07T20:02:17.9941294Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:17.9942454Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:17.9942923Z ^ 2025-05-07T20:02:17.9943196Z detected during: 2025-05-07T20:02:17.9958082Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:17.9986494Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:18.0015078Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:18.0031274Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:02:18.0032436Z 2025-05-07T20:02:18.0032689Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:18.0033136Z 2025-05-07T20:02:18.8661109Z [148/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu.o 2025-05-07T20:02:18.8673901Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:18.8675738Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:18.8676884Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:18.8677358Z ^ 2025-05-07T20:02:18.8677536Z 2025-05-07T20:02:18.8677805Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:18.8678173Z 2025-05-07T20:02:18.8678997Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:18.8680174Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:18.8680617Z ^ 2025-05-07T20:02:18.8680813Z 2025-05-07T20:02:18.8681672Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:18.8682816Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:18.8683262Z ^ 2025-05-07T20:02:18.8683559Z detected during: 2025-05-07T20:02:18.8698424Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:18.8726636Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:18.8756489Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:18.8772638Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:02:18.8773786Z 2025-05-07T20:02:18.8774035Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:18.8774425Z 2025-05-07T20:02:18.8775424Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:18.8776631Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:18.8777104Z ^ 2025-05-07T20:02:18.8777371Z detected during: 2025-05-07T20:02:18.8792236Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:18.8820350Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:18.8848833Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:18.8865083Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:02:18.8866209Z 2025-05-07T20:02:18.8866476Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:18.8866859Z 2025-05-07T20:02:18.8867671Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:18.8868841Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:18.8869286Z ^ 2025-05-07T20:02:18.8869579Z detected during: 2025-05-07T20:02:18.8884466Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:18.8912607Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:18.8941057Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:18.8957193Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:02:18.8958349Z 2025-05-07T20:02:18.8958598Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:18.8958956Z 2025-05-07T20:02:18.8960197Z ptxas /tmp/tmpxft_00008c93_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 889; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:18.8962743Z ptxas /tmp/tmpxft_00008c93_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 896; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:18.8965263Z ptxas /tmp/tmpxft_00008c93_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 903; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:18.8967814Z ptxas /tmp/tmpxft_00008c93_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 910; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:18.8970351Z ptxas /tmp/tmpxft_00008c93_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 1044; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:18.8972870Z ptxas /tmp/tmpxft_00008c93_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 1051; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:18.8975592Z ptxas /tmp/tmpxft_00008c93_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 1058; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:18.8978118Z ptxas /tmp/tmpxft_00008c93_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 1065; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:18.8980334Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:18.8981495Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:18.8981938Z ^ 2025-05-07T20:02:18.8982225Z detected during: 2025-05-07T20:02:18.8996949Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:18.9025060Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:18.9053613Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:18.9070444Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:02:18.9071572Z 2025-05-07T20:02:18.9071840Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:18.9072199Z 2025-05-07T20:02:18.9072994Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:18.9074157Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:18.9074635Z ^ 2025-05-07T20:02:18.9075062Z detected during: 2025-05-07T20:02:18.9089827Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:18.9117970Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:18.9146426Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:18.9162531Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:02:18.9163670Z 2025-05-07T20:02:18.9163915Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:18.9164286Z 2025-05-07T20:02:18.9165090Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:18.9166246Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:18.9166726Z ^ 2025-05-07T20:02:18.9167011Z detected during: 2025-05-07T20:02:18.9181852Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:18.9209978Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:18.9238383Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:18.9254461Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:02:18.9255652Z 2025-05-07T20:02:18.9255904Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:18.9256266Z 2025-05-07T20:02:18.9602560Z [149/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu.o 2025-05-07T20:02:18.9615063Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:18.9616900Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:18.9618046Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:18.9618513Z ^ 2025-05-07T20:02:18.9618692Z 2025-05-07T20:02:18.9618960Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:18.9619319Z 2025-05-07T20:02:18.9620146Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:18.9621325Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:18.9621771Z ^ 2025-05-07T20:02:18.9621968Z 2025-05-07T20:02:18.9622762Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:18.9623911Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:18.9624421Z ^ 2025-05-07T20:02:18.9624723Z detected during: 2025-05-07T20:02:18.9639580Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:18.9667543Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:18.9696501Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:18.9712692Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:02:18.9714431Z 2025-05-07T20:02:18.9714673Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:18.9715025Z 2025-05-07T20:02:18.9715836Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:18.9716944Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:18.9717397Z ^ 2025-05-07T20:02:18.9717656Z detected during: 2025-05-07T20:02:18.9732128Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:18.9760118Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:18.9788536Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:18.9804875Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:02:18.9806017Z 2025-05-07T20:02:18.9806263Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:18.9806647Z 2025-05-07T20:02:18.9807448Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:18.9808620Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:18.9809066Z ^ 2025-05-07T20:02:18.9809364Z detected during: 2025-05-07T20:02:18.9824201Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:18.9852512Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:18.9881389Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:18.9897809Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:02:18.9898975Z 2025-05-07T20:02:18.9899219Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:18.9899572Z 2025-05-07T20:02:18.9900394Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:18.9901539Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:18.9901998Z ^ 2025-05-07T20:02:18.9902270Z detected during: 2025-05-07T20:02:18.9917037Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:18.9944969Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:18.9973008Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:18.9989520Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:02:18.9990663Z 2025-05-07T20:02:18.9990933Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:18.9991333Z 2025-05-07T20:02:18.9992136Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:18.9993295Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:18.9993740Z ^ 2025-05-07T20:02:18.9994023Z detected during: 2025-05-07T20:02:19.0008596Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:19.0037184Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:19.0065814Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:19.0082245Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:02:19.0083394Z 2025-05-07T20:02:19.0083646Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:19.0084004Z 2025-05-07T20:02:19.0084826Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:19.0085958Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:19.0086414Z ^ 2025-05-07T20:02:19.0086712Z detected during: 2025-05-07T20:02:19.0101586Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:19.0129494Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:19.0158090Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:19.0173810Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:02:19.0175168Z 2025-05-07T20:02:19.0175483Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:19.0175841Z 2025-05-07T20:02:20.5849583Z [150/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu.o 2025-05-07T20:02:20.5862245Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:20.5863905Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:20.5865083Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:20.5865535Z ^ 2025-05-07T20:02:20.5865737Z 2025-05-07T20:02:20.5865986Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:20.5866348Z 2025-05-07T20:02:20.5867171Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:20.5868354Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:20.5868828Z ^ 2025-05-07T20:02:20.5869001Z 2025-05-07T20:02:20.5869799Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:20.5870947Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:20.5871411Z ^ 2025-05-07T20:02:20.5871677Z detected during: 2025-05-07T20:02:20.5886772Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:20.5914876Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:20.5943114Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:20.5960113Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:02:20.5961237Z 2025-05-07T20:02:20.5961518Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:20.5961896Z 2025-05-07T20:02:20.5962685Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:20.5963809Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:20.5964317Z ^ 2025-05-07T20:02:20.5964619Z detected during: 2025-05-07T20:02:20.5979498Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:20.6007297Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:20.6035802Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:20.6051665Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:02:20.6052797Z 2025-05-07T20:02:20.6053059Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:20.6053438Z 2025-05-07T20:02:20.6054223Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:20.6055401Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:20.6056036Z ^ 2025-05-07T20:02:20.6056310Z detected during: 2025-05-07T20:02:20.6071192Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:20.6099554Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:20.6127661Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:20.6143820Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:02:20.6144950Z 2025-05-07T20:02:20.6145195Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:20.6145579Z 2025-05-07T20:02:20.6146382Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:20.6147665Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:20.6148109Z ^ 2025-05-07T20:02:20.6148396Z detected during: 2025-05-07T20:02:20.6162834Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:20.6191006Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:20.6219331Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:20.6235535Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:02:20.6236655Z 2025-05-07T20:02:20.6236897Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:20.6237248Z 2025-05-07T20:02:20.6238057Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:20.6239201Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:20.6239670Z ^ 2025-05-07T20:02:20.6239938Z detected during: 2025-05-07T20:02:20.6254342Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:20.6283279Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:20.6312015Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:20.6327780Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:02:20.6328878Z 2025-05-07T20:02:20.6329149Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:20.6329520Z 2025-05-07T20:02:20.6330332Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:20.6331468Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:20.6331905Z ^ 2025-05-07T20:02:20.6332189Z detected during: 2025-05-07T20:02:20.6357263Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:20.6386234Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:20.6414391Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:20.6430746Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:02:20.6431867Z 2025-05-07T20:02:20.6432115Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:20.6432468Z 2025-05-07T20:02:49.5126796Z [151/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16.cu.o 2025-05-07T20:02:49.5139266Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:49.5140944Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:49.5142113Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:49.5142595Z ^ 2025-05-07T20:02:49.5142772Z 2025-05-07T20:02:49.5143022Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:49.5143476Z 2025-05-07T20:02:49.5144297Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:49.5145489Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:49.5145929Z ^ 2025-05-07T20:02:49.5146100Z 2025-05-07T20:02:49.5147066Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:49.5148451Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:49.5148932Z ^ 2025-05-07T20:02:49.5149141Z 2025-05-07T20:02:49.5150016Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:49.5151207Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:49.5151685Z ^ 2025-05-07T20:02:49.5151903Z 2025-05-07T20:02:49.5152782Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:49.5153972Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:49.5154425Z ^ 2025-05-07T20:02:49.5154661Z 2025-05-07T20:02:49.5154892Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:49.5155244Z 2025-05-07T20:02:49.5156119Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:49.5157329Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:49.5157786Z ^ 2025-05-07T20:02:49.5158024Z 2025-05-07T20:02:49.5158884Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:49.5160096Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:49.5160541Z ^ 2025-05-07T20:02:49.5160770Z 2025-05-07T20:02:49.5161001Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:49.5161331Z 2025-05-07T20:02:49.5162210Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:49.5163431Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:49.5163905Z ^ 2025-05-07T20:02:49.5164126Z 2025-05-07T20:02:49.5164982Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:49.5166229Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:49.5166702Z ^ 2025-05-07T20:02:49.5166911Z 2025-05-07T20:02:49.5167138Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:49.5167494Z 2025-05-07T20:02:49.5168354Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:49.5169558Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:49.5170013Z ^ 2025-05-07T20:02:49.5170254Z 2025-05-07T20:02:49.5171117Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:49.5172320Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:49.5172767Z ^ 2025-05-07T20:02:49.5172977Z 2025-05-07T20:02:49.5173232Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:49.5173559Z 2025-05-07T20:02:49.5174409Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:49.5176055Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:49.5176637Z ^ 2025-05-07T20:02:49.5176883Z 2025-05-07T20:02:49.5177859Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:49.5179163Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:49.5179667Z ^ 2025-05-07T20:02:49.5179888Z 2025-05-07T20:02:49.5180133Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:49.5180510Z 2025-05-07T20:02:49.5181428Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:49.5182738Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:49.5183230Z ^ 2025-05-07T20:02:49.5183470Z 2025-05-07T20:02:49.5184470Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:49.5185764Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:49.5186266Z ^ 2025-05-07T20:02:49.5186486Z 2025-05-07T20:02:49.5186753Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:49.5187110Z 2025-05-07T20:02:49.5188139Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:02:49.5189503Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:02:49.5189974Z ^ 2025-05-07T20:02:49.5190200Z 2025-05-07T20:03:20.0885531Z [152/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu.o 2025-05-07T20:03:20.0897756Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:03:20.0899365Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:03:20.0900716Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:03:20.0901225Z ^ 2025-05-07T20:03:20.0901414Z 2025-05-07T20:03:20.0901670Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:20.0902067Z 2025-05-07T20:03:20.0902898Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:03:20.0904125Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:03:20.0904628Z ^ 2025-05-07T20:03:20.0904811Z 2025-05-07T20:03:20.0905790Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:20.0907105Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:20.0907638Z ^ 2025-05-07T20:03:20.0907974Z 2025-05-07T20:03:20.0908862Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:20.0910064Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:20.0910570Z ^ 2025-05-07T20:03:20.0910797Z 2025-05-07T20:03:20.0911695Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:20.0912903Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:20.0913395Z ^ 2025-05-07T20:03:20.0913609Z 2025-05-07T20:03:20.0913844Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:20.0914248Z 2025-05-07T20:03:20.0915116Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:20.0916375Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:20.0916843Z ^ 2025-05-07T20:03:20.0917101Z 2025-05-07T20:03:20.0917974Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:20.0919212Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:20.0919672Z ^ 2025-05-07T20:03:20.0919928Z 2025-05-07T20:03:20.0920147Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:20.0920520Z 2025-05-07T20:03:20.0921392Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:20.0922618Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:20.0923104Z ^ 2025-05-07T20:03:20.0923364Z 2025-05-07T20:03:20.0924227Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:20.0925455Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:20.0925945Z ^ 2025-05-07T20:03:20.0926163Z 2025-05-07T20:03:20.0926434Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:20.0926773Z 2025-05-07T20:03:20.0927641Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:20.0928874Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:20.0929354Z ^ 2025-05-07T20:03:20.0929580Z 2025-05-07T20:03:20.0930448Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:20.0931679Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:20.0932162Z ^ 2025-05-07T20:03:20.0932383Z 2025-05-07T20:03:20.0932612Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:20.0932972Z 2025-05-07T20:03:20.0933834Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:20.0935058Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:20.0935785Z ^ 2025-05-07T20:03:20.0936077Z 2025-05-07T20:03:20.0937087Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:20.0938396Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:20.0938907Z ^ 2025-05-07T20:03:20.0939131Z 2025-05-07T20:03:20.0939402Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:20.0939759Z 2025-05-07T20:03:20.0940685Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:20.0941988Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:20.0942504Z ^ 2025-05-07T20:03:20.0942750Z 2025-05-07T20:03:20.0943689Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:20.0945047Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:20.0945561Z ^ 2025-05-07T20:03:20.0945786Z 2025-05-07T20:03:20.0946033Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:20.0946398Z 2025-05-07T20:03:20.0947362Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:20.0948726Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:20.0949225Z ^ 2025-05-07T20:03:20.0949447Z 2025-05-07T20:03:29.6576150Z [153/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu.o 2025-05-07T20:03:29.6589708Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:03:29.6591234Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:03:29.6592363Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:03:29.6592880Z ^ 2025-05-07T20:03:29.6593068Z 2025-05-07T20:03:29.6593304Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:29.6593649Z 2025-05-07T20:03:29.6594459Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:03:29.6595690Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:03:29.6596133Z ^ 2025-05-07T20:03:29.6596296Z 2025-05-07T20:03:29.6597214Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:29.6598491Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:29.6598969Z ^ 2025-05-07T20:03:29.6599183Z 2025-05-07T20:03:29.6600129Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:29.6601372Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:29.6601864Z ^ 2025-05-07T20:03:29.6602205Z 2025-05-07T20:03:29.6603098Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:29.6604290Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:29.6604747Z ^ 2025-05-07T20:03:29.6604951Z 2025-05-07T20:03:29.6605173Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:29.6605516Z 2025-05-07T20:03:29.6606362Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:29.6607593Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:29.6608075Z ^ 2025-05-07T20:03:29.6608309Z 2025-05-07T20:03:29.6609167Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:29.6610371Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:29.6610849Z ^ 2025-05-07T20:03:29.6611052Z 2025-05-07T20:03:29.6611272Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:29.6611621Z 2025-05-07T20:03:29.6612472Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:29.6613667Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:29.6614111Z ^ 2025-05-07T20:03:29.6614348Z 2025-05-07T20:03:29.6615339Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:29.6616815Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:29.6617291Z ^ 2025-05-07T20:03:29.6617548Z 2025-05-07T20:03:29.6617801Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:29.6618154Z 2025-05-07T20:03:29.6619075Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:29.6620367Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:29.6620862Z ^ 2025-05-07T20:03:29.6621092Z 2025-05-07T20:03:29.6622115Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:29.6623325Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:29.6623773Z ^ 2025-05-07T20:03:29.6623974Z 2025-05-07T20:03:29.6624198Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:29.6624522Z 2025-05-07T20:03:29.6625394Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:29.6626567Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:29.6627019Z ^ 2025-05-07T20:03:29.6627235Z 2025-05-07T20:03:29.6628106Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:29.6629352Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:29.6629809Z ^ 2025-05-07T20:03:29.6630011Z 2025-05-07T20:03:29.6630252Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:29.6630579Z 2025-05-07T20:03:29.6631432Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:29.6632618Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:29.6633078Z ^ 2025-05-07T20:03:29.6633295Z 2025-05-07T20:03:29.6634151Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __device__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:29.6635345Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:29.6635778Z ^ 2025-05-07T20:03:29.6636000Z 2025-05-07T20:03:29.6636250Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:29.6636572Z 2025-05-07T20:03:29.6637437Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp(205): warning #20012-D: __host__ annotation is ignored on a function("packed_scale_t") that is explicitly defaulted on its first declaration 2025-05-07T20:03:29.6638619Z __inline__ __attribute__((always_inline)) __attribute__((device)) __attribute__((host)) 2025-05-07T20:03:29.6639103Z ^ 2025-05-07T20:03:29.6639318Z 2025-05-07T20:03:30.3114449Z [154/156] : && /github/home/miniconda/envs/build_binary/bin/c++ -fPIC -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -fopenmp=libgomp -stdlib=libstdc++ -I/github/home/miniconda/envs/build_binary/include -O3 -DNDEBUG -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/github/home/miniconda/envs/build_binary/lib -Wl,-rpath-link,/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs -s -shared -Wl,-soname,fbgemm_gpu_experimental_gen_ai.so -o experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/attention.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/gqa_attn_splitk.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_lite.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/mixed_dtype_utils.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16_fast_gemv.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/include/fast_gemv.cu.o -L/lib/intel64 -L/lib/intel64_win -L/lib/win-x64 -Wl,-rpath,/lib/intel64:/lib/intel64_win:/lib/win-x64:/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib:/github/home/miniconda/envs/build_binary/lib/stubs: /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2 /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch_cpu.so" -Wl,--as-needed -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch_cuda.so" -Wl,--as-needed /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libcudart.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch.so" -Wl,--as-needed -lcudadevrt -lcudart_static -ldl && : 2025-05-07T20:03:30.5690162Z [155/156] cd /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai && bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/../.github/scripts/fbgemm_gpu_postbuild.bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:30.5691983Z ################################################################################ 2025-05-07T20:03:30.5692337Z [CMAKE] Running post-build script ... 2025-05-07T20:03:30.5693049Z Target file: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:30.5693745Z Removing all RPATHs ... 2025-05-07T20:03:30.5694037Z ################################################################################ 2025-05-07T20:03:30.5695038Z [155/156] cd /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build && /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/cmake/data/bin/cmake -P cmake_install.cmake 2025-05-07T20:03:30.6756712Z -- Install configuration: "Release" 2025-05-07T20:03:30.6797256Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/asmjit.so 2025-05-07T20:03:30.6866077Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/fbgemm.so 2025-05-07T20:03:30.6922357Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:30.6947741Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/bench 2025-05-07T20:03:30.6977359Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/bench/__init__.py 2025-05-07T20:03:30.6980475Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/bench/ck_bf16_bench.py 2025-05-07T20:03:30.6986041Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/bench/comm_bench.py 2025-05-07T20:03:30.6990325Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/bench/gather_scatter_bench.py 2025-05-07T20:03:30.6991564Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/bench/quantize_bench.py 2025-05-07T20:03:30.6992623Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/bench/quantize_ops.py 2025-05-07T20:03:30.7002978Z -- Up-to-date: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:30.7008136Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:30.7057507Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/README.md 2025-05-07T20:03:30.7065135Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/__init__.py 2025-05-07T20:03:30.7068414Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/activation.py 2025-05-07T20:03:30.7069768Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/gather_scatter.py 2025-05-07T20:03:30.7071012Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/layers.py 2025-05-07T20:03:30.7071982Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/shuffling.py 2025-05-07T20:03:30.7072968Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/__init__.py 2025-05-07T20:03:30.7079475Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/quantize.py 2025-05-07T20:03:30.7148177Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:30.7202461Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/example/__init__.py 2025-05-07T20:03:30.7205484Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/example/utils.py 2025-05-07T20:03:30.7269483Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/__init__.py 2025-05-07T20:03:30.7278184Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/fp8_gemm.py 2025-05-07T20:03:30.7299754Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/grouped_gemm.py 2025-05-07T20:03:30.7302638Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/matmul_perf_model.py 2025-05-07T20:03:30.7303757Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/utils.py 2025-05-07T20:03:30.7705636Z 2025-05-07T20:03:31.2620809Z 2025-05-07T20:03:31.2651949Z copying fbgemm_gpu/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/__init__.py 2025-05-07T20:03:31.2816116Z copying fbgemm_gpu/batched_unary_embeddings_ops.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/batched_unary_embeddings_ops.py 2025-05-07T20:03:31.2824007Z copying fbgemm_gpu/enums.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/enums.py 2025-05-07T20:03:31.2838188Z copying fbgemm_gpu/metrics.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/metrics.py 2025-05-07T20:03:31.2841167Z copying fbgemm_gpu/permute_pooled_embedding_modules.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules.py 2025-05-07T20:03:31.2879068Z copying fbgemm_gpu/permute_pooled_embedding_modules_split.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules_split.py 2025-05-07T20:03:31.2881723Z copying fbgemm_gpu/quantize_comm.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/quantize_comm.py 2025-05-07T20:03:31.2894789Z copying fbgemm_gpu/quantize_utils.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/quantize_utils.py 2025-05-07T20:03:31.2908289Z copying fbgemm_gpu/runtime_monitor.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/runtime_monitor.py 2025-05-07T20:03:31.2919206Z copying fbgemm_gpu/sparse_ops.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sparse_ops.py 2025-05-07T20:03:31.2936293Z copying fbgemm_gpu/split_embedding_configs.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_embedding_configs.py 2025-05-07T20:03:31.2938711Z copying fbgemm_gpu/split_embedding_inference_converter.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_embedding_inference_converter.py 2025-05-07T20:03:31.2947473Z copying fbgemm_gpu/split_embedding_optimizer_ops.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_embedding_optimizer_ops.py 2025-05-07T20:03:31.2954306Z copying fbgemm_gpu/split_embedding_utils.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_embedding_utils.py 2025-05-07T20:03:31.2962883Z copying fbgemm_gpu/split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops.py 2025-05-07T20:03:31.2968199Z copying fbgemm_gpu/split_table_batched_embeddings_ops_common.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_common.py 2025-05-07T20:03:31.2979918Z copying fbgemm_gpu/split_table_batched_embeddings_ops_inference.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_inference.py 2025-05-07T20:03:31.2997269Z copying fbgemm_gpu/split_table_batched_embeddings_ops_training.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training.py 2025-05-07T20:03:31.3028023Z copying fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py 2025-05-07T20:03:31.3033424Z copying fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py 2025-05-07T20:03:31.3041434Z copying fbgemm_gpu/tbe_input_multiplexer.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe_input_multiplexer.py 2025-05-07T20:03:31.3049705Z copying fbgemm_gpu/uvm.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/uvm.py 2025-05-07T20:03:31.3059973Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/config 2025-05-07T20:03:31.3092232Z copying fbgemm_gpu/config/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/config/__init__.py 2025-05-07T20:03:31.3105710Z copying fbgemm_gpu/config/feature_list.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/config/feature_list.py 2025-05-07T20:03:31.3110178Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs 2025-05-07T20:03:31.3136328Z copying fbgemm_gpu/docs/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/__init__.py 2025-05-07T20:03:31.3139825Z copying fbgemm_gpu/docs/common.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/common.py 2025-05-07T20:03:31.3151167Z copying fbgemm_gpu/docs/examples.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/examples.py 2025-05-07T20:03:31.3159936Z copying fbgemm_gpu/docs/jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/jagged_tensor_ops.py 2025-05-07T20:03:31.3169899Z copying fbgemm_gpu/docs/merge_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/merge_pooled_embedding_ops.py 2025-05-07T20:03:31.3193608Z copying fbgemm_gpu/docs/permute_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/permute_pooled_embedding_ops.py 2025-05-07T20:03:31.3196759Z copying fbgemm_gpu/docs/quantize_ops.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/quantize_ops.py 2025-05-07T20:03:31.3203836Z copying fbgemm_gpu/docs/sparse_ops.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/sparse_ops.py 2025-05-07T20:03:31.3219947Z copying fbgemm_gpu/docs/version.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/version.py 2025-05-07T20:03:31.3226513Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/quantize 2025-05-07T20:03:31.3252094Z copying fbgemm_gpu/quantize/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/quantize/__init__.py 2025-05-07T20:03:31.3256090Z copying fbgemm_gpu/quantize/quantize_ops.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/quantize/quantize_ops.py 2025-05-07T20:03:31.3262497Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll 2025-05-07T20:03:31.3264506Z copying fbgemm_gpu/sll/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/__init__.py 2025-05-07T20:03:31.3272654Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe 2025-05-07T20:03:31.3275250Z copying fbgemm_gpu/tbe/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/__init__.py 2025-05-07T20:03:31.3282035Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton 2025-05-07T20:03:31.3284118Z copying fbgemm_gpu/triton/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/__init__.py 2025-05-07T20:03:31.3298204Z copying fbgemm_gpu/triton/common.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/common.py 2025-05-07T20:03:31.3301672Z copying fbgemm_gpu/triton/quantize.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/quantize.py 2025-05-07T20:03:31.3311919Z copying fbgemm_gpu/triton/quantize_ref.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/quantize_ref.py 2025-05-07T20:03:31.3317938Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/utils 2025-05-07T20:03:31.3336479Z copying fbgemm_gpu/utils/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/utils/__init__.py 2025-05-07T20:03:31.3343856Z copying fbgemm_gpu/utils/filestore.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/utils/filestore.py 2025-05-07T20:03:31.3362404Z copying fbgemm_gpu/utils/loader.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/utils/loader.py 2025-05-07T20:03:31.3365237Z copying fbgemm_gpu/utils/torch_library.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/utils/torch_library.py 2025-05-07T20:03:31.3370017Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/cpu 2025-05-07T20:03:31.3370807Z copying fbgemm_gpu/sll/cpu/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/cpu/__init__.py 2025-05-07T20:03:31.3376495Z copying fbgemm_gpu/sll/cpu/cpu_sll.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/cpu/cpu_sll.py 2025-05-07T20:03:31.3384789Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/meta 2025-05-07T20:03:31.3387026Z copying fbgemm_gpu/sll/meta/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/meta/__init__.py 2025-05-07T20:03:31.3390122Z copying fbgemm_gpu/sll/meta/meta_sll.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/meta/meta_sll.py 2025-05-07T20:03:31.3400332Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton 2025-05-07T20:03:31.3401136Z copying fbgemm_gpu/sll/triton/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/__init__.py 2025-05-07T20:03:31.3418332Z copying fbgemm_gpu/sll/triton/common.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/common.py 2025-05-07T20:03:31.3437616Z copying fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py 2025-05-07T20:03:31.3441285Z copying fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py 2025-05-07T20:03:31.3444883Z copying fbgemm_gpu/sll/triton/triton_jagged_bmm.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm.py 2025-05-07T20:03:31.3454459Z copying fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py 2025-05-07T20:03:31.3463668Z copying fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py 2025-05-07T20:03:31.3475008Z copying fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py 2025-05-07T20:03:31.3481947Z copying fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py 2025-05-07T20:03:31.3492549Z copying fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py 2025-05-07T20:03:31.3500855Z copying fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py 2025-05-07T20:03:31.3508273Z copying fbgemm_gpu/sll/triton/triton_jagged_softmax.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_softmax.py 2025-05-07T20:03:31.3536410Z copying fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py 2025-05-07T20:03:31.3539131Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench 2025-05-07T20:03:31.3539997Z copying fbgemm_gpu/tbe/bench/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/__init__.py 2025-05-07T20:03:31.3556260Z copying fbgemm_gpu/tbe/bench/bench_config.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/bench_config.py 2025-05-07T20:03:31.3557736Z copying fbgemm_gpu/tbe/bench/bench_runs.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/bench_runs.py 2025-05-07T20:03:31.3578140Z copying fbgemm_gpu/tbe/bench/eeg_cli.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/eeg_cli.py 2025-05-07T20:03:31.3582340Z copying fbgemm_gpu/tbe/bench/embedding_ops_common_config.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/embedding_ops_common_config.py 2025-05-07T20:03:31.3599444Z copying fbgemm_gpu/tbe/bench/eval_compression.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/eval_compression.py 2025-05-07T20:03:31.3602461Z copying fbgemm_gpu/tbe/bench/reporter.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/reporter.py 2025-05-07T20:03:31.3607457Z copying fbgemm_gpu/tbe/bench/tbe_data_config.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config.py 2025-05-07T20:03:31.3625708Z copying fbgemm_gpu/tbe/bench/tbe_data_config_loader.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_loader.py 2025-05-07T20:03:31.3652273Z copying fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py 2025-05-07T20:03:31.3655665Z copying fbgemm_gpu/tbe/bench/utils.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/utils.py 2025-05-07T20:03:31.3662520Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/cache 2025-05-07T20:03:31.3664930Z copying fbgemm_gpu/tbe/cache/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/cache/__init__.py 2025-05-07T20:03:31.3687000Z copying fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py 2025-05-07T20:03:31.3692150Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:31.3693021Z copying fbgemm_gpu/tbe/ssd/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/__init__.py 2025-05-07T20:03:31.3700654Z copying fbgemm_gpu/tbe/ssd/common.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/common.py 2025-05-07T20:03:31.3708022Z copying fbgemm_gpu/tbe/ssd/inference.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/inference.py 2025-05-07T20:03:31.3717079Z copying fbgemm_gpu/tbe/ssd/training.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/training.py 2025-05-07T20:03:31.3745202Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/stats 2025-05-07T20:03:31.3747741Z copying fbgemm_gpu/tbe/stats/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/stats/__init__.py 2025-05-07T20:03:31.3751465Z copying fbgemm_gpu/tbe/stats/bench_params_reporter.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/stats/bench_params_reporter.py 2025-05-07T20:03:31.3756863Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils 2025-05-07T20:03:31.3757638Z copying fbgemm_gpu/tbe/utils/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/__init__.py 2025-05-07T20:03:31.3762758Z copying fbgemm_gpu/tbe/utils/common.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/common.py 2025-05-07T20:03:31.3778141Z copying fbgemm_gpu/tbe/utils/offsets.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/offsets.py 2025-05-07T20:03:31.3793745Z copying fbgemm_gpu/tbe/utils/quantize.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/quantize.py 2025-05-07T20:03:31.3797002Z copying fbgemm_gpu/tbe/utils/requests.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/requests.py 2025-05-07T20:03:31.3807409Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:31.3808789Z copying fbgemm_gpu/tbe/ssd/utils/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/utils/__init__.py 2025-05-07T20:03:31.3816817Z copying fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py 2025-05-07T20:03:31.3843877Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/jagged 2025-05-07T20:03:31.3859608Z copying fbgemm_gpu/triton/jagged/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/jagged/__init__.py 2025-05-07T20:03:31.3866882Z copying fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py 2025-05-07T20:03:31.4016327Z 2025-05-07T20:03:32.2475799Z INFO:root:running bdist_wheel 2025-05-07T20:03:32.4574342Z INFO:root:running build 2025-05-07T20:03:32.4590531Z INFO:root:running build_py 2025-05-07T20:03:32.4978382Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:32.5047419Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:32.5051053Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/batched_unary_embeddings_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:32.5052542Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/enums.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:32.5053822Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/metrics.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:32.5055271Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:32.5058591Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules_split.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:32.5060046Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/quantize_comm.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:32.5095782Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/quantize_utils.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:32.5098935Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/runtime_monitor.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:32.5100286Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sparse_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:32.5101958Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_embedding_configs.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:32.5103331Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_embedding_inference_converter.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:32.5104725Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_embedding_optimizer_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:32.5106129Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_embedding_utils.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:32.5107455Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:32.5108869Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_common.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:32.5110333Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_inference.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:32.5121068Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:32.5137553Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:32.5139176Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:32.5140623Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe_input_multiplexer.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:32.5142048Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/uvm.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:32.5150353Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/config 2025-05-07T20:03:32.5151466Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/config/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/config 2025-05-07T20:03:32.5152800Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/config/feature_list.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/config 2025-05-07T20:03:32.5154004Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:03:32.5175678Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:03:32.5179749Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/common.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:03:32.5181151Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/examples.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:03:32.5184311Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:03:32.5185981Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/merge_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:03:32.5187520Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/permute_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:03:32.5189184Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/quantize_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:03:32.5190553Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/sparse_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:03:32.5218126Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/version.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:03:32.5220045Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/quantize 2025-05-07T20:03:32.5247051Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/quantize/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/quantize 2025-05-07T20:03:32.5248475Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/quantize/quantize_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/quantize 2025-05-07T20:03:32.5249569Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll 2025-05-07T20:03:32.5250556Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll 2025-05-07T20:03:32.5251561Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe 2025-05-07T20:03:32.5252537Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe 2025-05-07T20:03:32.5253554Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton 2025-05-07T20:03:32.5254750Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton 2025-05-07T20:03:32.5256664Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/common.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton 2025-05-07T20:03:32.5258097Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/quantize.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton 2025-05-07T20:03:32.5259549Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/quantize_ref.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton 2025-05-07T20:03:32.5276784Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/utils 2025-05-07T20:03:32.5278100Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/utils/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/utils 2025-05-07T20:03:32.5279488Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/utils/filestore.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/utils 2025-05-07T20:03:32.5280830Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/utils/loader.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/utils 2025-05-07T20:03:32.5282375Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/utils/torch_library.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/utils 2025-05-07T20:03:32.5283588Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/cpu 2025-05-07T20:03:32.5284617Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/cpu/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/cpu 2025-05-07T20:03:32.5286523Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/cpu/cpu_sll.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/cpu 2025-05-07T20:03:32.5288351Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/meta 2025-05-07T20:03:32.5289491Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/meta/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/meta 2025-05-07T20:03:32.5290892Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/meta/meta_sll.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/meta 2025-05-07T20:03:32.5293250Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:03:32.5295022Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:03:32.5296884Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/common.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:03:32.5298612Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:03:32.5300398Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:03:32.5302130Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:03:32.5303699Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:03:32.5328581Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:03:32.5330667Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:03:32.5332548Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:03:32.5334372Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:03:32.5336402Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:03:32.5338340Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_softmax.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:03:32.5340233Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:03:32.5341671Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:03:32.5342921Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:03:32.5344605Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/bench_config.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:03:32.5346206Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/bench_runs.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:03:32.5347793Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/eeg_cli.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:03:32.5349293Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/embedding_ops_common_config.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:03:32.5350816Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/eval_compression.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:03:32.5352282Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/reporter.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:03:32.5353747Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:03:32.5373874Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_loader.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:03:32.5375909Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:03:32.5377924Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/utils.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:03:32.5379039Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/cache 2025-05-07T20:03:32.5380240Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/cache/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/cache 2025-05-07T20:03:32.5381789Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/cache 2025-05-07T20:03:32.5383041Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:32.5384324Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:32.5385694Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/common.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:32.5389918Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/inference.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:32.5397055Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/training.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:32.5399475Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/stats 2025-05-07T20:03:32.5400661Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/stats/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/stats 2025-05-07T20:03:32.5402270Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/stats/bench_params_reporter.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/stats 2025-05-07T20:03:32.5403694Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils 2025-05-07T20:03:32.5404872Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils 2025-05-07T20:03:32.5406362Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/common.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils 2025-05-07T20:03:32.5408044Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/offsets.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils 2025-05-07T20:03:32.5409552Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/quantize.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils 2025-05-07T20:03:32.5411062Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/requests.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils 2025-05-07T20:03:32.5412444Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:32.5413660Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/utils/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:32.5415409Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:32.5417088Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton/jagged 2025-05-07T20:03:32.5418296Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/jagged/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton/jagged 2025-05-07T20:03:32.5436294Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton/jagged 2025-05-07T20:03:32.6094153Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/asmjit.so -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:32.6139770Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/fbgemm.so -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:32.6407014Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:32.6410620Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:33.0694762Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/bench 2025-05-07T20:03:33.0696147Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/bench/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/bench 2025-05-07T20:03:33.0699359Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/bench/ck_bf16_bench.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/bench 2025-05-07T20:03:33.0705650Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/bench/comm_bench.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/bench 2025-05-07T20:03:33.0711966Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/bench/gather_scatter_bench.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/bench 2025-05-07T20:03:33.0718241Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/bench/quantize_bench.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/bench 2025-05-07T20:03:33.0730962Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/bench/quantize_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/bench 2025-05-07T20:03:33.0768237Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:33.0769574Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/README.md -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:33.0772229Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:33.0782483Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/activation.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:33.0795980Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/gather_scatter.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:33.0814598Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/layers.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:33.0831899Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/shuffling.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:33.0836364Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:33.0848056Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/quantize.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:33.0854580Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/example 2025-05-07T20:03:33.0856365Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/example/fbgemm_gpu_experimental_example_py.so -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/example 2025-05-07T20:03:33.0896846Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/example/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/example 2025-05-07T20:03:33.0902191Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/example/utils.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/example 2025-05-07T20:03:33.0905659Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:33.0907070Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:33.0918299Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/fp8_gemm.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:33.0938329Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/grouped_gemm.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:33.0954846Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/matmul_perf_model.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:33.0958516Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/utils.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:33.0968404Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:33.0972810Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/batched_unary_embeddings_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:33.0974332Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/enums.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:33.0976142Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/metrics.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:33.0978172Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:33.0979793Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules_split.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:33.0981689Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/quantize_comm.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:33.0983076Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/quantize_utils.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:33.0984383Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/runtime_monitor.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:33.0985724Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sparse_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:33.0987323Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_embedding_configs.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:33.0989034Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_embedding_inference_converter.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:33.0990594Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_embedding_optimizer_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:33.0991909Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_embedding_utils.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:33.0993281Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:33.0994686Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_common.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:33.0996125Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_inference.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:33.0998105Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:33.1016524Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:33.1020601Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:33.1022071Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe_input_multiplexer.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:33.1023360Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/uvm.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:03:33.1024649Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/config/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/config 2025-05-07T20:03:33.1026055Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/config/feature_list.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/config 2025-05-07T20:03:33.1027659Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:03:33.1029032Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/common.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:03:33.1030292Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/examples.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:03:33.1031574Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:03:33.1032929Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/merge_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:03:33.1034329Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/permute_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:03:33.1035656Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/quantize_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:03:33.1042829Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/sparse_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:03:33.1046786Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/version.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:03:33.1050396Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/quantize/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/quantize 2025-05-07T20:03:33.1051785Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/quantize/quantize_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/quantize 2025-05-07T20:03:33.1053076Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll 2025-05-07T20:03:33.1054268Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe 2025-05-07T20:03:33.1055796Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton 2025-05-07T20:03:33.1057195Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/common.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton 2025-05-07T20:03:33.1058570Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/quantize.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton 2025-05-07T20:03:33.1059991Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/quantize_ref.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton 2025-05-07T20:03:33.1061386Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/utils/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/utils 2025-05-07T20:03:33.1062740Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/utils/filestore.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/utils 2025-05-07T20:03:33.1064167Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/utils/loader.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/utils 2025-05-07T20:03:33.1065602Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/utils/torch_library.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/utils 2025-05-07T20:03:33.1066986Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/cpu/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/cpu 2025-05-07T20:03:33.1068485Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/cpu/cpu_sll.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/cpu 2025-05-07T20:03:33.1075194Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/meta/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/meta 2025-05-07T20:03:33.1079255Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/meta/meta_sll.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/meta 2025-05-07T20:03:33.1083125Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:03:33.1085198Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/common.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:03:33.1086714Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:03:33.1088456Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:03:33.1089983Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:03:33.1091427Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:03:33.1092966Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:03:33.1094576Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:03:33.1096673Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:03:33.1098355Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:03:33.1100090Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:03:33.1101747Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_softmax.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:03:33.1103556Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:03:33.1105275Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:03:33.1106744Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/bench_config.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:03:33.1108202Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/bench_runs.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:03:33.1109661Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/eeg_cli.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:03:33.1111268Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/embedding_ops_common_config.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:03:33.1112780Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/eval_compression.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:03:33.1114284Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/reporter.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:03:33.1115734Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:03:33.1117212Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_loader.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:03:33.1118886Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:03:33.1120290Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/utils.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:03:33.1121614Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/cache/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/cache 2025-05-07T20:03:33.1122992Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/cache 2025-05-07T20:03:33.1124374Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:33.1125883Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/common.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:33.1127241Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/inference.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:33.1128651Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/training.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:33.1130363Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/stats/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/stats 2025-05-07T20:03:33.1132242Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/stats/bench_params_reporter.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/stats 2025-05-07T20:03:33.1133904Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils 2025-05-07T20:03:33.1135380Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/common.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils 2025-05-07T20:03:33.1136800Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/offsets.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils 2025-05-07T20:03:33.1138221Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/quantize.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils 2025-05-07T20:03:33.1139654Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/requests.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils 2025-05-07T20:03:33.1142533Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/utils/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:33.1169838Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:33.1175423Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/jagged/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton/jagged 2025-05-07T20:03:33.1180182Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton/jagged 2025-05-07T20:03:33.1257153Z INFO:skbuild:copied 90 files 2025-05-07T20:03:33.1258061Z INFO:root:running build_ext 2025-05-07T20:03:33.1818240Z INFO:root:installing to _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel 2025-05-07T20:03:33.1819760Z INFO:root:running install 2025-05-07T20:03:33.2379444Z INFO:root:running install_lib 2025-05-07T20:03:33.2455408Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel 2025-05-07T20:03:33.2459885Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu 2025-05-07T20:03:33.2460632Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/config 2025-05-07T20:03:33.2461779Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/config/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/config 2025-05-07T20:03:33.2463270Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/config/feature_list.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/config 2025-05-07T20:03:33.2464421Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/docs 2025-05-07T20:03:33.2465517Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:33.2466930Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs/common.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:33.2468386Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs/examples.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:33.2470059Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs/jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:33.2471680Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs/merge_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:33.2473287Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs/permute_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:33.2475199Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs/quantize_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:33.2476950Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs/sparse_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:33.2478536Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs/version.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:33.2479731Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/quantize 2025-05-07T20:03:33.2481109Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/quantize/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/quantize 2025-05-07T20:03:33.2482778Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/quantize/quantize_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/quantize 2025-05-07T20:03:33.2483959Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/sll 2025-05-07T20:03:33.2484659Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/sll/cpu 2025-05-07T20:03:33.2485771Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/cpu/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/cpu 2025-05-07T20:03:33.2487250Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/cpu/cpu_sll.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/cpu 2025-05-07T20:03:33.2492270Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/sll/meta 2025-05-07T20:03:33.2495943Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/meta/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/meta 2025-05-07T20:03:33.2499507Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/meta/meta_sll.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/meta 2025-05-07T20:03:33.2500724Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/sll/triton 2025-05-07T20:03:33.2502074Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:33.2503611Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton/common.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:33.2505220Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:33.2507087Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:33.2508754Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton/triton_jagged_bmm.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:33.2510409Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:33.2512135Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:33.2513947Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:33.2516371Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:33.2518125Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:33.2519921Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:33.2521680Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton/triton_jagged_softmax.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:33.2523381Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:33.2524986Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll 2025-05-07T20:03:33.2526050Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe 2025-05-07T20:03:33.2526777Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/bench 2025-05-07T20:03:33.2527923Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:33.2529461Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench/bench_config.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:33.2530997Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench/bench_runs.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:33.2532524Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench/eeg_cli.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:33.2534182Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench/embedding_ops_common_config.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:33.2536089Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench/eval_compression.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:33.2537806Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench/reporter.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:33.2539499Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench/tbe_data_config.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:33.2541394Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench/tbe_data_config_loader.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:33.2543193Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:33.2544962Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench/utils.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:33.2546176Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/cache 2025-05-07T20:03:33.2547403Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/cache/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/cache 2025-05-07T20:03:33.2549163Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/cache 2025-05-07T20:03:33.2550436Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:33.2551281Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:33.2552567Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd/utils/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:33.2554353Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:33.2556020Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd 2025-05-07T20:03:33.2557498Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd/common.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd 2025-05-07T20:03:33.2558970Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd/inference.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd 2025-05-07T20:03:33.2560485Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd/training.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd 2025-05-07T20:03:33.2565028Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/stats 2025-05-07T20:03:33.2566207Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/stats/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/stats 2025-05-07T20:03:33.2567805Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/stats/bench_params_reporter.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/stats 2025-05-07T20:03:33.2569026Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/utils 2025-05-07T20:03:33.2570132Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/utils 2025-05-07T20:03:33.2571653Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils/common.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/utils 2025-05-07T20:03:33.2573185Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils/offsets.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/utils 2025-05-07T20:03:33.2574949Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils/quantize.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/utils 2025-05-07T20:03:33.2576851Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils/requests.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/utils 2025-05-07T20:03:33.2578436Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe 2025-05-07T20:03:33.2579659Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/triton 2025-05-07T20:03:33.2580462Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/triton/jagged 2025-05-07T20:03:33.2581740Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton/jagged/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton/jagged 2025-05-07T20:03:33.2588687Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton/jagged 2025-05-07T20:03:33.2610710Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton 2025-05-07T20:03:33.2612251Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton/common.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton 2025-05-07T20:03:33.2613956Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton/quantize.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton 2025-05-07T20:03:33.2615802Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton/quantize_ref.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton 2025-05-07T20:03:33.2617013Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/utils 2025-05-07T20:03:33.2618189Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/utils/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/utils 2025-05-07T20:03:33.2620022Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/utils/filestore.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/utils 2025-05-07T20:03:33.2621623Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/utils/loader.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/utils 2025-05-07T20:03:33.2623215Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/utils/torch_library.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/utils 2025-05-07T20:03:33.2624724Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/asmjit.so -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:33.2626196Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/fbgemm.so -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:33.2662005Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental 2025-05-07T20:03:33.2664548Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:33.2669017Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:33.3258523Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:33.3262726Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/moe/README.md -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:33.3268579Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/moe/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:33.3270694Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/moe/activation.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:33.3272452Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/moe/gather_scatter.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:33.3274211Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/moe/layers.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:33.3276430Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/moe/shuffling.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:33.3278211Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:33.3279971Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/quantize.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:33.3281492Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/bench 2025-05-07T20:03:33.3282781Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/bench/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:03:33.3284472Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/bench/ck_bf16_bench.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:03:33.3286163Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/bench/comm_bench.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:03:33.3287870Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/bench/gather_scatter_bench.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:03:33.3289634Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/bench/quantize_bench.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:03:33.3291406Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/bench/quantize_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:03:33.3292668Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/example 2025-05-07T20:03:33.3294058Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/example/fbgemm_gpu_experimental_example_py.so -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/example 2025-05-07T20:03:33.3296226Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/example/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/example 2025-05-07T20:03:33.3298070Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/example/utils.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/example 2025-05-07T20:03:33.3299408Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/gemm 2025-05-07T20:03:33.3300356Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:33.3301924Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gemm/triton_gemm/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:33.3303763Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gemm/triton_gemm/fp8_gemm.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:33.3305638Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gemm/triton_gemm/grouped_gemm.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:33.3307559Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gemm/triton_gemm/matmul_perf_model.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:33.3309408Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gemm/triton_gemm/utils.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:33.3311073Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:33.3312505Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/batched_unary_embeddings_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:33.3313899Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/enums.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:33.3320476Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/metrics.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:33.3325060Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/permute_pooled_embedding_modules.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:33.3330037Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/permute_pooled_embedding_modules_split.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:33.3331585Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/quantize_comm.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:33.3332986Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/quantize_utils.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:33.3334414Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/runtime_monitor.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:33.3336117Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sparse_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:33.3337707Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/split_embedding_configs.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:33.3339352Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/split_embedding_inference_converter.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:33.3340993Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/split_embedding_optimizer_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:33.3342616Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/split_embedding_utils.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:33.3344239Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:33.3345911Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/split_table_batched_embeddings_ops_common.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:33.3347673Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/split_table_batched_embeddings_ops_inference.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:33.3349468Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/split_table_batched_embeddings_ops_training.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:33.3351135Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:33.3352763Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:33.3354274Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe_input_multiplexer.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:33.3355638Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/uvm.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:33.3356469Z INFO:skbuild:copied 115 files 2025-05-07T20:03:33.3356776Z INFO:root:running install_egg_info 2025-05-07T20:03:33.4605368Z INFO:root:running egg_info 2025-05-07T20:03:33.4705605Z INFO:root:creating fbgemm_gpu_genai_nightly.egg-info 2025-05-07T20:03:33.4710104Z INFO:root:writing fbgemm_gpu_genai_nightly.egg-info/PKG-INFO 2025-05-07T20:03:33.5013949Z INFO:root:writing dependency_links to fbgemm_gpu_genai_nightly.egg-info/dependency_links.txt 2025-05-07T20:03:33.5055201Z INFO:root:writing requirements to fbgemm_gpu_genai_nightly.egg-info/requires.txt 2025-05-07T20:03:33.5060392Z INFO:root:writing top-level names to fbgemm_gpu_genai_nightly.egg-info/top_level.txt 2025-05-07T20:03:33.5107581Z INFO:root:writing manifest file 'fbgemm_gpu_genai_nightly.egg-info/SOURCES.txt' 2025-05-07T20:03:33.5613375Z INFO:root:reading manifest file 'fbgemm_gpu_genai_nightly.egg-info/SOURCES.txt' 2025-05-07T20:03:33.5667743Z INFO:root:writing manifest file 'fbgemm_gpu_genai_nightly.egg-info/SOURCES.txt' 2025-05-07T20:03:33.5675001Z INFO:root:Copying fbgemm_gpu_genai_nightly.egg-info to _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu_genai_nightly-2025.5.7-py3.13.egg-info 2025-05-07T20:03:33.5713119Z INFO:root:running install_scripts 2025-05-07T20:03:33.5714114Z INFO:skbuild:copied 0 files 2025-05-07T20:03:42.3217411Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu_genai_nightly-2025.5.7.dist-info/WHEEL 2025-05-07T20:03:42.3626540Z INFO:wheel:creating '/__w/FBGEMM/FBGEMM/fbgemm_gpu/dist/.tmp-99kfrfao/fbgemm_gpu_genai_nightly-2025.5.7-cp313-cp313-manylinux_2_28_x86_64.whl' and adding '_skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel' to it 2025-05-07T20:03:42.3776060Z INFO:wheel:adding 'fbgemm_gpu/__init__.py' 2025-05-07T20:03:42.4382373Z INFO:wheel:adding 'fbgemm_gpu/asmjit.so' 2025-05-07T20:03:42.4398161Z INFO:wheel:adding 'fbgemm_gpu/batched_unary_embeddings_ops.py' 2025-05-07T20:03:42.4399177Z INFO:wheel:adding 'fbgemm_gpu/enums.py' 2025-05-07T20:03:42.6404841Z INFO:wheel:adding 'fbgemm_gpu/fbgemm.so' 2025-05-07T20:03:42.6526595Z INFO:wheel:adding 'fbgemm_gpu/metrics.py' 2025-05-07T20:03:42.6527151Z INFO:wheel:adding 'fbgemm_gpu/permute_pooled_embedding_modules.py' 2025-05-07T20:03:42.6527704Z INFO:wheel:adding 'fbgemm_gpu/permute_pooled_embedding_modules_split.py' 2025-05-07T20:03:42.6530021Z INFO:wheel:adding 'fbgemm_gpu/quantize_comm.py' 2025-05-07T20:03:42.6533896Z INFO:wheel:adding 'fbgemm_gpu/quantize_utils.py' 2025-05-07T20:03:42.6536115Z INFO:wheel:adding 'fbgemm_gpu/runtime_monitor.py' 2025-05-07T20:03:42.6548152Z INFO:wheel:adding 'fbgemm_gpu/sparse_ops.py' 2025-05-07T20:03:42.6549944Z INFO:wheel:adding 'fbgemm_gpu/split_embedding_configs.py' 2025-05-07T20:03:42.6552691Z INFO:wheel:adding 'fbgemm_gpu/split_embedding_inference_converter.py' 2025-05-07T20:03:42.6554178Z INFO:wheel:adding 'fbgemm_gpu/split_embedding_optimizer_ops.py' 2025-05-07T20:03:42.6555464Z INFO:wheel:adding 'fbgemm_gpu/split_embedding_utils.py' 2025-05-07T20:03:42.6557462Z INFO:wheel:adding 'fbgemm_gpu/split_table_batched_embeddings_ops.py' 2025-05-07T20:03:42.6560957Z INFO:wheel:adding 'fbgemm_gpu/split_table_batched_embeddings_ops_common.py' 2025-05-07T20:03:42.6581837Z INFO:wheel:adding 'fbgemm_gpu/split_table_batched_embeddings_ops_inference.py' 2025-05-07T20:03:42.6623931Z INFO:wheel:adding 'fbgemm_gpu/split_table_batched_embeddings_ops_training.py' 2025-05-07T20:03:42.6628257Z INFO:wheel:adding 'fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py' 2025-05-07T20:03:42.6630066Z INFO:wheel:adding 'fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py' 2025-05-07T20:03:42.6631222Z INFO:wheel:adding 'fbgemm_gpu/tbe_input_multiplexer.py' 2025-05-07T20:03:42.6632105Z INFO:wheel:adding 'fbgemm_gpu/uvm.py' 2025-05-07T20:03:42.6634022Z INFO:wheel:adding 'fbgemm_gpu/config/__init__.py' 2025-05-07T20:03:42.6635685Z INFO:wheel:adding 'fbgemm_gpu/config/feature_list.py' 2025-05-07T20:03:42.6637382Z INFO:wheel:adding 'fbgemm_gpu/docs/__init__.py' 2025-05-07T20:03:42.6638624Z INFO:wheel:adding 'fbgemm_gpu/docs/common.py' 2025-05-07T20:03:42.6640315Z INFO:wheel:adding 'fbgemm_gpu/docs/examples.py' 2025-05-07T20:03:42.6642645Z INFO:wheel:adding 'fbgemm_gpu/docs/jagged_tensor_ops.py' 2025-05-07T20:03:42.6644256Z INFO:wheel:adding 'fbgemm_gpu/docs/merge_pooled_embedding_ops.py' 2025-05-07T20:03:42.6646526Z INFO:wheel:adding 'fbgemm_gpu/docs/permute_pooled_embedding_ops.py' 2025-05-07T20:03:42.6647865Z INFO:wheel:adding 'fbgemm_gpu/docs/quantize_ops.py' 2025-05-07T20:03:42.6653915Z INFO:wheel:adding 'fbgemm_gpu/docs/sparse_ops.py' 2025-05-07T20:03:42.6655678Z INFO:wheel:adding 'fbgemm_gpu/docs/version.py' 2025-05-07T20:03:42.6657598Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/__init__.py' 2025-05-07T20:03:42.6659634Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/ck_bf16_bench.py' 2025-05-07T20:03:42.6663120Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/comm_bench.py' 2025-05-07T20:03:42.6666841Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/gather_scatter_bench.py' 2025-05-07T20:03:42.6672154Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/quantize_bench.py' 2025-05-07T20:03:42.6684773Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/quantize_ops.py' 2025-05-07T20:03:42.6687511Z INFO:wheel:adding 'fbgemm_gpu/experimental/example/__init__.py' 2025-05-07T20:03:42.6836211Z INFO:wheel:adding 'fbgemm_gpu/experimental/example/fbgemm_gpu_experimental_example_py.so' 2025-05-07T20:03:42.6845910Z INFO:wheel:adding 'fbgemm_gpu/experimental/example/utils.py' 2025-05-07T20:03:42.6847556Z INFO:wheel:adding 'fbgemm_gpu/experimental/gemm/triton_gemm/__init__.py' 2025-05-07T20:03:42.6875881Z INFO:wheel:adding 'fbgemm_gpu/experimental/gemm/triton_gemm/fp8_gemm.py' 2025-05-07T20:03:42.6881689Z INFO:wheel:adding 'fbgemm_gpu/experimental/gemm/triton_gemm/grouped_gemm.py' 2025-05-07T20:03:42.6884707Z INFO:wheel:adding 'fbgemm_gpu/experimental/gemm/triton_gemm/matmul_perf_model.py' 2025-05-07T20:03:42.6886825Z INFO:wheel:adding 'fbgemm_gpu/experimental/gemm/triton_gemm/utils.py' 2025-05-07T20:03:42.6888560Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/__init__.py' 2025-05-07T20:03:44.6358214Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so' 2025-05-07T20:03:44.8358738Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/quantize.py' 2025-05-07T20:03:44.8360245Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/README.md' 2025-05-07T20:03:44.8361737Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/__init__.py' 2025-05-07T20:03:44.8363222Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/activation.py' 2025-05-07T20:03:44.8364829Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/gather_scatter.py' 2025-05-07T20:03:44.8374062Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/layers.py' 2025-05-07T20:03:44.8378804Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/shuffling.py' 2025-05-07T20:03:44.8380239Z INFO:wheel:adding 'fbgemm_gpu/quantize/__init__.py' 2025-05-07T20:03:44.8381996Z INFO:wheel:adding 'fbgemm_gpu/quantize/quantize_ops.py' 2025-05-07T20:03:44.8383785Z INFO:wheel:adding 'fbgemm_gpu/sll/__init__.py' 2025-05-07T20:03:44.8386255Z INFO:wheel:adding 'fbgemm_gpu/sll/cpu/__init__.py' 2025-05-07T20:03:44.8392318Z INFO:wheel:adding 'fbgemm_gpu/sll/cpu/cpu_sll.py' 2025-05-07T20:03:44.8394293Z INFO:wheel:adding 'fbgemm_gpu/sll/meta/__init__.py' 2025-05-07T20:03:44.8396630Z INFO:wheel:adding 'fbgemm_gpu/sll/meta/meta_sll.py' 2025-05-07T20:03:44.8398961Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/__init__.py' 2025-05-07T20:03:44.8400469Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/common.py' 2025-05-07T20:03:44.8401867Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py' 2025-05-07T20:03:44.8404271Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py' 2025-05-07T20:03:44.8408444Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_bmm.py' 2025-05-07T20:03:44.8412157Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py' 2025-05-07T20:03:44.8414165Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py' 2025-05-07T20:03:44.8415608Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py' 2025-05-07T20:03:44.8422173Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py' 2025-05-07T20:03:44.8427275Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py' 2025-05-07T20:03:44.8429109Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py' 2025-05-07T20:03:44.8432240Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_softmax.py' 2025-05-07T20:03:44.8438108Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py' 2025-05-07T20:03:44.8454876Z INFO:wheel:adding 'fbgemm_gpu/tbe/__init__.py' 2025-05-07T20:03:44.8456560Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/__init__.py' 2025-05-07T20:03:44.8457028Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/bench_config.py' 2025-05-07T20:03:44.8457466Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/bench_runs.py' 2025-05-07T20:03:44.8457883Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/eeg_cli.py' 2025-05-07T20:03:44.8458366Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/embedding_ops_common_config.py' 2025-05-07T20:03:44.8458885Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/eval_compression.py' 2025-05-07T20:03:44.8459333Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/reporter.py' 2025-05-07T20:03:44.8459778Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/tbe_data_config.py' 2025-05-07T20:03:44.8461806Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/tbe_data_config_loader.py' 2025-05-07T20:03:44.8464147Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py' 2025-05-07T20:03:44.8465735Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/utils.py' 2025-05-07T20:03:44.8467205Z INFO:wheel:adding 'fbgemm_gpu/tbe/cache/__init__.py' 2025-05-07T20:03:44.8468712Z INFO:wheel:adding 'fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py' 2025-05-07T20:03:44.8470100Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/__init__.py' 2025-05-07T20:03:44.8471350Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/common.py' 2025-05-07T20:03:44.8477498Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/inference.py' 2025-05-07T20:03:44.8502374Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/training.py' 2025-05-07T20:03:44.8506577Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/utils/__init__.py' 2025-05-07T20:03:44.8508482Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py' 2025-05-07T20:03:44.8510072Z INFO:wheel:adding 'fbgemm_gpu/tbe/stats/__init__.py' 2025-05-07T20:03:44.8513272Z INFO:wheel:adding 'fbgemm_gpu/tbe/stats/bench_params_reporter.py' 2025-05-07T20:03:44.8514594Z INFO:wheel:adding 'fbgemm_gpu/tbe/utils/__init__.py' 2025-05-07T20:03:44.8524900Z INFO:wheel:adding 'fbgemm_gpu/tbe/utils/common.py' 2025-05-07T20:03:44.8526729Z INFO:wheel:adding 'fbgemm_gpu/tbe/utils/offsets.py' 2025-05-07T20:03:44.8529071Z INFO:wheel:adding 'fbgemm_gpu/tbe/utils/quantize.py' 2025-05-07T20:03:44.8534964Z INFO:wheel:adding 'fbgemm_gpu/tbe/utils/requests.py' 2025-05-07T20:03:44.8537075Z INFO:wheel:adding 'fbgemm_gpu/triton/__init__.py' 2025-05-07T20:03:44.8538239Z INFO:wheel:adding 'fbgemm_gpu/triton/common.py' 2025-05-07T20:03:44.8546200Z INFO:wheel:adding 'fbgemm_gpu/triton/quantize.py' 2025-05-07T20:03:45.0579449Z INFO:wheel:adding 'fbgemm_gpu/triton/quantize_ref.py' 2025-05-07T20:03:45.0580808Z INFO:wheel:adding 'fbgemm_gpu/triton/jagged/__init__.py' 2025-05-07T20:03:45.0587897Z INFO:wheel:adding 'fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py' 2025-05-07T20:03:45.0589642Z INFO:wheel:adding 'fbgemm_gpu/utils/__init__.py' 2025-05-07T20:03:45.0591820Z INFO:wheel:adding 'fbgemm_gpu/utils/filestore.py' 2025-05-07T20:03:45.0593208Z INFO:wheel:adding 'fbgemm_gpu/utils/loader.py' 2025-05-07T20:03:45.0595373Z INFO:wheel:adding 'fbgemm_gpu/utils/torch_library.py' 2025-05-07T20:03:45.0598337Z INFO:wheel:adding 'fbgemm_gpu_genai_nightly-2025.5.7.dist-info/METADATA' 2025-05-07T20:03:45.0599952Z INFO:wheel:adding 'fbgemm_gpu_genai_nightly-2025.5.7.dist-info/WHEEL' 2025-05-07T20:03:45.0601595Z INFO:wheel:adding 'fbgemm_gpu_genai_nightly-2025.5.7.dist-info/top_level.txt' 2025-05-07T20:03:45.0650980Z INFO:wheel:adding 'fbgemm_gpu_genai_nightly-2025.5.7.dist-info/RECORD' 2025-05-07T20:03:45.0652270Z INFO:root:removing _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel 2025-05-07T20:03:45.1696115Z ╒════════════════════════════╤════════════════════════════════════════════════╕ 2025-05-07T20:03:45.1697783Z │ │ Version │ 2025-05-07T20:03:45.1698394Z ╞════════════════════════════╪════════════════════════════════════════════════╡ 2025-05-07T20:03:45.1698928Z │ PyTorch │ 2.8.0.dev20250507+cu128 │ 2025-05-07T20:03:45.1699545Z ├────────────────────────────┼────────────────────────────────────────────────┤ 2025-05-07T20:03:45.1700200Z │ CUDA (Declared by PyTorch) │ 12.8 │ 2025-05-07T20:03:45.1700851Z ├────────────────────────────┼────────────────────────────────────────────────┤ 2025-05-07T20:03:45.1701410Z │ CUDA (Actual) │ nvcc: NVIDIA (R) Cuda compiler driver │ 2025-05-07T20:03:45.1702111Z │ │ Copyright (c) 2005-2025 NVIDIA Corporation │ 2025-05-07T20:03:45.1702639Z │ │ Built on Wed_Jan_15_19:20:09_PST_2025 │ 2025-05-07T20:03:45.1703144Z │ │ Cuda compilation tools, release 12.8, V12.8.61 │ 2025-05-07T20:03:45.1703677Z │ │ Build cuda_12.8.r12.8/compiler.35404655_0 │ 2025-05-07T20:03:45.1704224Z ╘════════════════════════════╧════════════════════════════════════════════════╛ 2025-05-07T20:03:54.3804364Z Successfully built fbgemm_gpu_genai_nightly-2025.5.7-cp313-cp313-manylinux_2_28_x86_64.whl 2025-05-07T20:03:58.2542124Z 2025-05-07T20:03:58.3733069Z ################################################################################ 2025-05-07T20:03:58.3735058Z [CHECK] BUILT LIBRARY: ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:58.3736038Z [CHECK] Listing out library size: 2025-05-07T20:03:58.3754282Z + du -h --block-size=1M ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:58.3755822Z 2025-05-07T20:03:58.3877607Z 91 ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:58.3878963Z 2025-05-07T20:03:58.3924664Z [CHECK] Listing out the GLIBC versions referenced by: ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:58.3926581Z + objdump -TC ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so | grep GLIBC_ | sed 's/.*GLIBC_\([.0-9]*\).*/GLIBC_\1/g' | sort -Vu | cat 2025-05-07T20:03:58.3927407Z 2025-05-07T20:03:58.5217484Z GLIBC_2.2.5 2025-05-07T20:03:58.5218177Z GLIBC_2.3 2025-05-07T20:03:58.5218712Z GLIBC_2.14 2025-05-07T20:03:58.5219067Z 2025-05-07T20:03:58.5219081Z 2025-05-07T20:03:58.5221091Z [CHECK] Listing out the GLIBCXX versions referenced by: ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:58.5223689Z + objdump -TC ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so | grep GLIBCXX_ | sed 's/.*GLIBCXX_\([.0-9]*\).*/GLIBCXX_\1/g' | sort -Vu | cat 2025-05-07T20:03:58.5224381Z 2025-05-07T20:03:58.5454290Z GLIBCXX_3.4 2025-05-07T20:03:58.5454965Z GLIBCXX_3.4.9 2025-05-07T20:03:58.5456776Z GLIBCXX_3.4.11 2025-05-07T20:03:58.5457484Z GLIBCXX_3.4.18 2025-05-07T20:03:58.5458092Z GLIBCXX_3.4.21 2025-05-07T20:03:58.5458666Z GLIBCXX_3.4.29 2025-05-07T20:03:58.5459081Z 2025-05-07T20:03:58.5459094Z 2025-05-07T20:03:58.6054165Z + nm -gDC ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so > /tmp/tmp.FM4uT46iXz.symbols.txt 2025-05-07T20:03:58.6054808Z 2025-05-07T20:03:58.6296328Z 2025-05-07T20:03:58.6624408Z [CHECK] Total Number of symbols: 2736 2025-05-07T20:03:58.6663089Z [CHECK] Number of fbgemm symbols: 676 2025-05-07T20:03:58.6688577Z + nm -gDCu ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so > /tmp/tmp.iaLk7g0ePf.usymbols.txt 2025-05-07T20:03:58.6690421Z 2025-05-07T20:03:58.6727740Z 2025-05-07T20:03:58.6765533Z [CHECK] Listing out undefined symbols (249 total): 2025-05-07T20:03:58.6782071Z U VTT for std::__cxx11::basic_ostringstream, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:03:58.6783862Z U VTT for std::__cxx11::basic_stringstream, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:03:58.6784460Z U _Unwind_Resume@GCC_3.0 2025-05-07T20:03:58.6785087Z U __assert_fail@GLIBC_2.2.5 2025-05-07T20:03:58.6785450Z U __cudaPopCallConfiguration@libcudart.so.12 2025-05-07T20:03:58.6785893Z U __cudaPushCallConfiguration@libcudart.so.12 2025-05-07T20:03:58.6786298Z U __cudaRegisterFatBinary@libcudart.so.12 2025-05-07T20:03:58.6786706Z U __cudaRegisterFatBinaryEnd@libcudart.so.12 2025-05-07T20:03:58.6787088Z U __cudaRegisterFunction@libcudart.so.12 2025-05-07T20:03:58.6787472Z U __cudaRegisterVar@libcudart.so.12 2025-05-07T20:03:58.6787869Z U __cudaUnregisterFatBinary@libcudart.so.12 2025-05-07T20:03:58.6788245Z U __cxa_allocate_exception@CXXABI_1.3 2025-05-07T20:03:58.6788606Z U __cxa_atexit@GLIBC_2.2.5 2025-05-07T20:03:58.6788932Z U __cxa_begin_catch@CXXABI_1.3 2025-05-07T20:03:58.6789379Z U __cxa_end_catch@CXXABI_1.3 2025-05-07T20:03:58.6789714Z U __cxa_free_exception@CXXABI_1.3 2025-05-07T20:03:58.6790084Z U __cxa_guard_abort@CXXABI_1.3 2025-05-07T20:03:58.6790420Z U __cxa_guard_acquire@CXXABI_1.3 2025-05-07T20:03:58.6790778Z U __cxa_guard_release@CXXABI_1.3 2025-05-07T20:03:58.6791126Z U __cxa_rethrow@CXXABI_1.3 2025-05-07T20:03:58.6791460Z U __cxa_thread_atexit@CXXABI_1.3.7 2025-05-07T20:03:58.6791921Z U __cxa_throw@CXXABI_1.3 2025-05-07T20:03:58.6792236Z U __gxx_personality_v0@CXXABI_1.3 2025-05-07T20:03:58.6792576Z U __tls_get_addr@GLIBC_2.3 2025-05-07T20:03:58.6792878Z U __udivti3@GCC_3.0 2025-05-07T20:03:58.6793177Z U __xstat@GLIBC_2.2.5 2025-05-07T20:03:58.6793505Z U at::CUDAGeneratorImpl::device_type() 2025-05-07T20:03:58.6793921Z U at::CUDAGeneratorImpl::philox_cuda_state(unsigned long) 2025-05-07T20:03:58.6794341Z U at::TensorMaker::make_tensor() 2025-05-07T20:03:58.6794900Z U at::_ops::add__Tensor::call(at::Tensor&, at::Tensor const&, c10::Scalar const&) 2025-05-07T20:03:58.6795443Z U at::_ops::div__Scalar::call(at::Tensor&, c10::Scalar const&) 2025-05-07T20:03:58.6796259Z U at::_ops::empty_like::call(at::Tensor const&, std::optional, std::optional, std::optional, std::optional, std::optional) 2025-05-07T20:03:58.6797537Z U at::_ops::empty_memory_format::call(c10::ArrayRef, std::optional, std::optional, std::optional, std::optional, std::optional) 2025-05-07T20:03:58.6798494Z U at::_ops::expand::call(at::Tensor const&, c10::ArrayRef, bool) 2025-05-07T20:03:58.6798996Z U at::_ops::index_select::call(at::Tensor const&, long, at::Tensor const&) 2025-05-07T20:03:58.6799493Z U at::_ops::norm_Scalar::call(at::Tensor const&, c10::Scalar const&) 2025-05-07T20:03:58.6800015Z U at::_ops::scatter_add_::call(at::Tensor&, long, at::Tensor const&, at::Tensor const&) 2025-05-07T20:03:58.6800514Z U at::_ops::select_int::call(at::Tensor const&, long, c10::SymInt) 2025-05-07T20:03:58.6801027Z U at::_ops::split_sizes::call(at::Tensor const&, c10::ArrayRef, long) 2025-05-07T20:03:58.6801756Z U at::_ops::sum_dim_IntList::call(at::Tensor const&, c10::OptionalArrayRef, bool, std::optional) 2025-05-07T20:03:58.6802523Z U at::_ops::to_dtype::call(at::Tensor const&, c10::ScalarType, bool, bool, std::optional) 2025-05-07T20:03:58.6803516Z U at::_ops::to_dtype_layout::call(at::Tensor const&, std::optional, std::optional, std::optional, std::optional, bool, bool, std::optional) 2025-05-07T20:03:58.6804610Z U at::_ops::unsqueeze::call(at::Tensor const&, long) 2025-05-07T20:03:58.6805043Z U at::_ops::view::call(at::Tensor const&, c10::ArrayRef) 2025-05-07T20:03:58.6805824Z U at::_ops::zeros::call(c10::ArrayRef, std::optional, std::optional, std::optional, std::optional) 2025-05-07T20:03:58.6806584Z U at::cuda::detail::getDefaultCUDAGenerator(signed char) 2025-05-07T20:03:58.6806995Z U at::cuda::getCurrentDeviceProperties() 2025-05-07T20:03:58.6807427Z U at::tensor(c10::ArrayRef, c10::TensorOptions const&) 2025-05-07T20:03:58.6807824Z U bcmp@GLIBC_2.2.5 2025-05-07T20:03:58.6808175Z U c10::AutogradMetaInterface::~AutogradMetaInterface() 2025-05-07T20:03:58.6808696Z U c10::BFloat16* at::TensorBase::data_ptr() const 2025-05-07T20:03:58.6809189Z U c10::BFloat16* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:03:58.6809617Z U c10::BoolType::get() 2025-05-07T20:03:58.6810296Z U c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) 2025-05-07T20:03:58.6810871Z U c10::Error::what() const 2025-05-07T20:03:58.6811306Z U c10::Float8_e4m3fn* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:03:58.6811734Z U c10::FloatType::get() 2025-05-07T20:03:58.6812051Z U c10::GeneratorImpl::device() const 2025-05-07T20:03:58.6812372Z U c10::IValue::isTensorList() const 2025-05-07T20:03:58.6812732Z U c10::IValue::reportToTensorTypeError() const 2025-05-07T20:03:58.6813079Z U c10::IntType::get() 2025-05-07T20:03:58.6813717Z U c10::ListType::get(std::__cxx11::basic_string, std::allocator > const&, c10::Type::SingletonOrSharedTypePtr) 2025-05-07T20:03:58.6814486Z U c10::MessageLogger::MessageLogger(char const*, int, int) 2025-05-07T20:03:58.6814869Z U c10::MessageLogger::~MessageLogger() 2025-05-07T20:03:58.6815409Z U c10::OptionalType::get(c10::Type::SingletonOrSharedTypePtr) 2025-05-07T20:03:58.6816051Z U c10::ScalarTypeType::get() 2025-05-07T20:03:58.6816503Z U c10::StorageImpl::throw_data_ptr_access_error() const 2025-05-07T20:03:58.6816902Z U c10::StringType::get() 2025-05-07T20:03:58.6817263Z U c10::SymBool::guard_bool(char const*, long) const 2025-05-07T20:03:58.6817697Z U c10::SymFloat::guard_float(char const*, long) const 2025-05-07T20:03:58.6818395Z U c10::SymInt::SymInt(c10::intrusive_ptr >) 2025-05-07T20:03:58.6819064Z U c10::SymInt::guard_int(char const*, long) const 2025-05-07T20:03:58.6819455Z U c10::SymInt::promote_to_negative() 2025-05-07T20:03:58.6819797Z U c10::SymInt::toSymNode() const 2025-05-07T20:03:58.6820191Z U c10::SymbolicShapeMeta::init_is_contiguous() const 2025-05-07T20:03:58.6820994Z U c10::TensorImpl::set_autograd_meta(std::unique_ptr >) 2025-05-07T20:03:58.6821712Z U c10::TensorImpl::throw_data_ptr_access_error() const 2025-05-07T20:03:58.6822211Z U c10::TensorType::get() 2025-05-07T20:03:58.6822526Z U c10::UndefinedTensorImpl::_singleton 2025-05-07T20:03:58.6823453Z U c10::Warning::Warning(std::variant, c10::SourceLocation const&, std::__cxx11::basic_string, std::allocator >, bool) 2025-05-07T20:03:58.6824416Z U c10::cuda::CUDACachingAllocator::allocator 2025-05-07T20:03:58.6824766Z U c10::cuda::CUDAStream::stream() const 2025-05-07T20:03:58.6825118Z U c10::cuda::ExchangeDevice(signed char) 2025-05-07T20:03:58.6825450Z U c10::cuda::GetDevice(signed char*) 2025-05-07T20:03:58.6825803Z U c10::cuda::MaybeSetDevice(signed char) 2025-05-07T20:03:58.6826162Z U c10::cuda::SetDevice(signed char) 2025-05-07T20:03:58.6826613Z U c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) 2025-05-07T20:03:58.6827077Z U c10::cuda::current_device() 2025-05-07T20:03:58.6827378Z U c10::cuda::device_count() 2025-05-07T20:03:58.6827720Z U c10::cuda::getCurrentCUDAStream(signed char) 2025-05-07T20:03:58.6828116Z U c10::cuda::getDefaultCUDAStream(signed char) 2025-05-07T20:03:58.6828507Z U c10::cuda::getStreamFromPool(bool, signed char) 2025-05-07T20:03:58.6828915Z U c10::cuda::getStreamFromPool(int, signed char) 2025-05-07T20:03:58.6829301Z U c10::cuda::setCurrentCUDAStream(c10::cuda::CUDAStream) 2025-05-07T20:03:58.6829696Z U c10::cuda::warn_or_error_on_sync() 2025-05-07T20:03:58.6830321Z U c10::detail::ListImpl::ListImpl(std::vector >, c10::Type::SingletonOrSharedTypePtr) 2025-05-07T20:03:58.6831334Z U c10::detail::infer_schema::make_function_schema(c10::ArrayRef, c10::ArrayRef) 2025-05-07T20:03:58.6832178Z U c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) 2025-05-07T20:03:58.6832987Z U c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&) 2025-05-07T20:03:58.6833916Z U c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, char const*) 2025-05-07T20:03:58.6834902Z U c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string, std::allocator > const&) 2025-05-07T20:03:58.6835693Z U c10::get_default_dtype() 2025-05-07T20:03:58.6836154Z U c10::impl::ExcludeDispatchKeyGuard::ExcludeDispatchKeyGuard(c10::DispatchKeySet) 2025-05-07T20:03:58.6836723Z U c10::impl::ExcludeDispatchKeyGuard::~ExcludeDispatchKeyGuard() 2025-05-07T20:03:58.6837132Z U c10::impl::GPUTrace::gpuTraceState 2025-05-07T20:03:58.6837471Z U c10::impl::GPUTrace::haveState 2025-05-07T20:03:58.6837844Z U c10::impl::cow::is_cow_data_ptr(c10::DataPtr const&) 2025-05-07T20:03:58.6838250Z U c10::impl::cow::materialize_cow_storage(c10::StorageImpl&) 2025-05-07T20:03:58.6838645Z U c10::impl::device_guard_impl_registry 2025-05-07T20:03:58.6838977Z U c10::operator*(c10::SymInt const&, int) 2025-05-07T20:03:58.6839335Z U c10::operator-(c10::SymInt const&, int) 2025-05-07T20:03:58.6839672Z U c10::operator-(c10::SymInt const&, long) 2025-05-07T20:03:58.6840138Z U c10::operator<<(std::ostream&, c10::Device const&) 2025-05-07T20:03:58.6840533Z U c10::operator<<(std::ostream&, c10::DeviceType) 2025-05-07T20:03:58.6840879Z U c10::throwNullDataPtrError() 2025-05-07T20:03:58.6841214Z U c10::warn(c10::Warning const&) 2025-05-07T20:03:58.6841527Z U c10::warnDeprecatedDataPtr() 2025-05-07T20:03:58.6842212Z U c10d::getNcclErrorDetailStr(ncclResult_t, std::optional, std::allocator > >) 2025-05-07T20:03:58.6842978Z U c10d::ncclGetErrorWithVersion[abi:cxx11](ncclResult_t) 2025-05-07T20:03:58.6843436Z U caffe2::TypeMeta::error_unsupported_typemeta(caffe2::TypeMeta) 2025-05-07T20:03:58.6843864Z U caffe2::TypeMeta::typeMetaDatas() 2025-05-07T20:03:58.6844177Z U cublasLtCreate 2025-05-07T20:03:58.6844467Z U cublasLtMatmul 2025-05-07T20:03:58.6844765Z U cublasLtMatmulAlgoGetHeuristic 2025-05-07T20:03:58.6845115Z U cublasLtMatmulDescCreate 2025-05-07T20:03:58.6845463Z U cublasLtMatmulDescSetAttribute 2025-05-07T20:03:58.6845795Z U cublasLtMatmulPreferenceCreate 2025-05-07T20:03:58.6846152Z U cublasLtMatmulPreferenceSetAttribute 2025-05-07T20:03:58.6846482Z U cublasLtMatrixLayoutCreate 2025-05-07T20:03:58.6846858Z U cudaDeviceGetAttribute@libcudart.so.12 2025-05-07T20:03:58.6847205Z U cudaDeviceSynchronize@libcudart.so.12 2025-05-07T20:03:58.6847568Z U cudaEventCreateWithFlags@libcudart.so.12 2025-05-07T20:03:58.6847912Z U cudaEventDestroy@libcudart.so.12 2025-05-07T20:03:58.6848261Z U cudaEventElapsedTime@libcudart.so.12 2025-05-07T20:03:58.6848603Z U cudaEventQuery@libcudart.so.12 2025-05-07T20:03:58.6848921Z U cudaEventRecord@libcudart.so.12 2025-05-07T20:03:58.6849270Z U cudaEventSynchronize@libcudart.so.12 2025-05-07T20:03:58.6849589Z U cudaFree@libcudart.so.12 2025-05-07T20:03:58.6849919Z U cudaFuncSetAttribute@libcudart.so.12 2025-05-07T20:03:58.6850245Z U cudaGetDevice@libcudart.so.12 2025-05-07T20:03:58.6850602Z U cudaGetDeviceProperties_v2@libcudart.so.12 2025-05-07T20:03:58.6850981Z U cudaGetDriverEntryPoint@libcudart.so.12 2025-05-07T20:03:58.6851321Z U cudaGetErrorName@libcudart.so.12 2025-05-07T20:03:58.6851667Z U cudaGetErrorString@libcudart.so.12 2025-05-07T20:03:58.6852026Z U cudaGetLastError@libcudart.so.12 2025-05-07T20:03:58.6852378Z U cudaIpcGetMemHandle@libcudart.so.12 2025-05-07T20:03:58.6852718Z U cudaIpcOpenMemHandle@libcudart.so.12 2025-05-07T20:03:58.6853091Z U cudaLaunchCooperativeKernel@libcudart.so.12 2025-05-07T20:03:58.6853451Z U cudaLaunchKernel@libcudart.so.12 2025-05-07T20:03:58.6853795Z U cudaLaunchKernelExC@libcudart.so.12 2025-05-07T20:03:58.6854143Z U cudaMalloc@libcudart.so.12 2025-05-07T20:03:58.6854450Z U cudaMemcpy@libcudart.so.12 2025-05-07T20:03:58.6854779Z U cudaMemcpyAsync@libcudart.so.12 2025-05-07T20:03:58.6855101Z U cudaMemsetAsync@libcudart.so.12 2025-05-07T20:03:58.6855548Z U cudaStreamQuery@libcudart.so.12 2025-05-07T20:03:58.6856079Z U cudaStreamSynchronize@libcudart.so.12 2025-05-07T20:03:58.6856471Z U cudaStreamWaitEvent@libcudart.so.12 2025-05-07T20:03:58.6856829Z U exit@GLIBC_2.2.5 2025-05-07T20:03:58.6857125Z U fclose@GLIBC_2.2.5 2025-05-07T20:03:58.6857443Z U fflush@GLIBC_2.2.5 2025-05-07T20:03:58.6857790Z U float* at::TensorBase::data_ptr() const 2025-05-07T20:03:58.6858271Z U float* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:03:58.6858644Z U fopen@GLIBC_2.2.5 2025-05-07T20:03:58.6858958Z U fprintf@GLIBC_2.2.5 2025-05-07T20:03:58.6859252Z U fread@GLIBC_2.2.5 2025-05-07T20:03:58.6859557Z U fwrite@GLIBC_2.2.5 2025-05-07T20:03:58.6859908Z U int* at::TensorBase::data_ptr() const 2025-05-07T20:03:58.6860334Z U int* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:03:58.6860791Z U long c10::detail::maybe_wrap_dim_slow(long, long, bool) 2025-05-07T20:03:58.6861227Z U long* at::TensorBase::data_ptr() const 2025-05-07T20:03:58.6861591Z U memcpy@GLIBC_2.14 2025-05-07T20:03:58.6861886Z U memmove@GLIBC_2.2.5 2025-05-07T20:03:58.6862312Z U memset@GLIBC_2.2.5 2025-05-07T20:03:58.6862606Z U ncclAllGather 2025-05-07T20:03:58.6862864Z U ncclAllReduce 2025-05-07T20:03:58.6863137Z U ncclCommInitRank 2025-05-07T20:03:58.6863404Z U ncclGetUniqueId 2025-05-07T20:03:58.6863685Z U ncclReduceScatter 2025-05-07T20:03:58.6863978Z U operator delete(void*)@GLIBCXX_3.4 2025-05-07T20:03:58.6864325Z U operator new(unsigned long)@GLIBCXX_3.4 2025-05-07T20:03:58.6864670Z U printf@GLIBC_2.2.5 2025-05-07T20:03:58.6865039Z U signed char* at::TensorBase::data_ptr() const 2025-05-07T20:03:58.6865516Z U signed char* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:03:58.6866136Z U std::__cxx11::basic_ostringstream, std::allocator >::basic_ostringstream() 2025-05-07T20:03:58.6866918Z U std::__cxx11::basic_ostringstream, std::allocator >::str() const &@GLIBCXX_3.4.29 2025-05-07T20:03:58.6867747Z U std::__cxx11::basic_ostringstream, std::allocator >::~basic_ostringstream()@GLIBCXX_3.4.21 2025-05-07T20:03:58.6868524Z U std::__cxx11::basic_stringstream, std::allocator >::basic_stringstream() 2025-05-07T20:03:58.6869305Z U std::__detail::_Prime_rehash_policy::_M_need_rehash(unsigned long, unsigned long, unsigned long) const@GLIBCXX_3.4.18 2025-05-07T20:03:58.6869876Z U std::__throw_bad_alloc()@GLIBCXX_3.4 2025-05-07T20:03:58.6870224Z U std::__throw_bad_array_new_length() 2025-05-07T20:03:58.6870595Z U std::__throw_bad_cast()@GLIBCXX_3.4 2025-05-07T20:03:58.6870946Z U std::__throw_length_error(char const*)@GLIBCXX_3.4 2025-05-07T20:03:58.6871334Z U std::__throw_logic_error(char const*)@GLIBCXX_3.4 2025-05-07T20:03:58.6871702Z U std::__throw_system_error(int)@GLIBCXX_3.4.11 2025-05-07T20:03:58.6872177Z U std::basic_ios >::clear(std::_Ios_Iostate)@GLIBCXX_3.4 2025-05-07T20:03:58.6873069Z U std::basic_ostream >& std::__ostream_insert >(std::basic_ostream >&, char const*, long)@GLIBCXX_3.4.9 2025-05-07T20:03:58.6874146Z U std::basic_ostream >& std::endl >(std::basic_ostream >&)@GLIBCXX_3.4 2025-05-07T20:03:58.6875568Z U std::basic_ostream >& std::operator<< >(std::basic_ostream >&, char const*)@GLIBCXX_3.4 2025-05-07T20:03:58.6876766Z U std::basic_ostream >& std::operator<< >(std::basic_ostream >&, unsigned char const*)@GLIBCXX_3.4 2025-05-07T20:03:58.6877607Z U std::cerr@GLIBCXX_3.4 2025-05-07T20:03:58.6877953Z U std::cout@GLIBCXX_3.4 2025-05-07T20:03:58.6878344Z U std::ctype::_M_widen_init() const@GLIBCXX_3.4.11 2025-05-07T20:03:58.6878760Z U std::exception::what() const@GLIBCXX_3.4 2025-05-07T20:03:58.6879161Z U std::exception::~exception()@GLIBCXX_3.4 2025-05-07T20:03:58.6879534Z U std::ios_base::Init::Init()@GLIBCXX_3.4 2025-05-07T20:03:58.6879959Z U std::ios_base::Init::~Init()@GLIBCXX_3.4 2025-05-07T20:03:58.6880320Z U std::ios_base::~ios_base()@GLIBCXX_3.4 2025-05-07T20:03:58.6880692Z U std::locale::~locale()@GLIBCXX_3.4 2025-05-07T20:03:58.6881209Z U std::logic_error::logic_error(char const*)@GLIBCXX_3.4.21 2025-05-07T20:03:58.6881598Z U std::logic_error::~logic_error()@GLIBCXX_3.4 2025-05-07T20:03:58.6882026Z U std::ostream& std::ostream::_M_insert(long)@GLIBCXX_3.4.9 2025-05-07T20:03:58.6882534Z U std::ostream& std::ostream::_M_insert(unsigned long)@GLIBCXX_3.4.9 2025-05-07T20:03:58.6883101Z U std::ostream& std::ostream::_M_insert(void const*)@GLIBCXX_3.4.9 2025-05-07T20:03:58.6883556Z U std::ostream::flush()@GLIBCXX_3.4 2025-05-07T20:03:58.6883931Z U std::ostream::operator<<(int)@GLIBCXX_3.4 2025-05-07T20:03:58.6884294Z U std::ostream::put(char)@GLIBCXX_3.4 2025-05-07T20:03:58.6884677Z U std::runtime_error::runtime_error(char const*)@GLIBCXX_3.4.21 2025-05-07T20:03:58.6885385Z U std::runtime_error::runtime_error(std::__cxx11::basic_string, std::allocator > const&)@GLIBCXX_3.4.21 2025-05-07T20:03:58.6886047Z U std::runtime_error::~runtime_error()@GLIBCXX_3.4 2025-05-07T20:03:58.6886394Z U std::terminate()@GLIBCXX_3.4 2025-05-07T20:03:58.6886714Z U stderr@GLIBC_2.2.5 2025-05-07T20:03:58.6886995Z U strlen@GLIBC_2.2.5 2025-05-07T20:03:58.6887326Z U torch::CppFunction::~CppFunction() 2025-05-07T20:03:58.6888103Z U torch::Library::Library(torch::Library::Kind, std::__cxx11::basic_string, std::allocator >, std::optional, char const*, unsigned int) 2025-05-07T20:03:58.6889225Z U torch::Library::_def(c10::FunctionSchema&&, c10::OperatorName*, std::vector > const&, torch::_RegisterOrVerify) & 2025-05-07T20:03:58.6890061Z U torch::Library::_impl(char const*, torch::CppFunction&&, torch::_RegisterOrVerify) & 2025-05-07T20:03:58.6890883Z U torch::cuda::nccl::all2all(std::vector >&, std::vector >&, void*, c10::cuda::CUDAStream&) 2025-05-07T20:03:58.6891781Z U torch::cuda::nccl::all2all_single_equal_split(at::Tensor&, at::Tensor&, int, void*, c10::cuda::CUDAStream&) 2025-05-07T20:03:58.6892555Z U torch::jit::parseSchema(std::__cxx11::basic_string, std::allocator > const&, bool) 2025-05-07T20:03:58.6893098Z U typeinfo for c10::Error 2025-05-07T20:03:58.6893433Z U typeinfo for std::exception@GLIBCXX_3.4 2025-05-07T20:03:58.6893795Z U typeinfo for std::logic_error@GLIBCXX_3.4 2025-05-07T20:03:58.6894139Z U typeinfo for std::runtime_error@GLIBCXX_3.4 2025-05-07T20:03:58.6894581Z U unsigned char* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:03:58.6894980Z U usleep@GLIBC_2.2.5 2025-05-07T20:03:58.6895398Z U vtable for __cxxabiv1::__class_type_info@CXXABI_1.3 2025-05-07T20:03:58.6896024Z U vtable for __cxxabiv1::__function_type_info@CXXABI_1.3 2025-05-07T20:03:58.6896482Z U vtable for __cxxabiv1::__si_class_type_info@CXXABI_1.3 2025-05-07T20:03:58.6896875Z U vtable for c10::Error 2025-05-07T20:03:58.6897414Z U vtable for std::__cxx11::basic_stringbuf, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:03:58.6898108Z U vtable for std::basic_streambuf >@GLIBCXX_3.4 2025-05-07T20:03:58.6898622Z U vtable for torch::autograd::AutogradMeta 2025-05-07T20:03:58.6899000Z w _ITM_deregisterTMCloneTable 2025-05-07T20:03:58.6899349Z w _ITM_registerTMCloneTable 2025-05-07T20:03:58.6899674Z w __cxa_finalize@GLIBC_2.2.5 2025-05-07T20:03:58.6899997Z w __gmon_start__ 2025-05-07T20:03:58.6900281Z w __pthread_key_create 2025-05-07T20:03:58.6900613Z w pthread_mutex_lock@GLIBC_2.2.5 2025-05-07T20:03:58.6900953Z w pthread_mutex_unlock@GLIBC_2.2.5 2025-05-07T20:03:58.6901344Z [CHECK] Listing out external shared libraries linked: 2025-05-07T20:03:58.6901950Z + ldd ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:58.6902396Z 2025-05-07T20:03:58.6932107Z linux-vdso.so.1 (0x00007fffde149000) 2025-05-07T20:03:58.6933211Z libtorch.so => not found 2025-05-07T20:03:58.6935060Z libc10.so => not found 2025-05-07T20:03:58.6935463Z libc10_cuda.so => not found 2025-05-07T20:03:58.6935768Z libnccl.so.2 => not found 2025-05-07T20:03:58.6936043Z libtorch_cpu.so => not found 2025-05-07T20:03:58.6936340Z libtorch_cuda.so => not found 2025-05-07T20:03:58.6936620Z libcudart.so.12 => not found 2025-05-07T20:03:58.6936975Z libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fe69119c000) 2025-05-07T20:03:58.6937405Z libgomp.so.1 => /lib64/libgomp.so.1 (0x00007fe69719b000) 2025-05-07T20:03:58.6937833Z libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fe69716d000) 2025-05-07T20:03:58.6938225Z libc.so.6 => /lib64/libc.so.6 (0x00007fe690f94000) 2025-05-07T20:03:58.6938607Z /lib64/ld-linux-x86-64.so.2 (0x00007fe6971f7000) 2025-05-07T20:03:58.6938990Z libm.so.6 => /lib64/libm.so.6 (0x00007fe690eb9000) 2025-05-07T20:03:58.6939227Z 2025-05-07T20:03:58.6939342Z [CHECK] Displaying ELF information: 2025-05-07T20:03:58.6939922Z + readelf -d ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:58.6940394Z 2025-05-07T20:03:58.7134443Z 2025-05-07T20:03:58.7135591Z Dynamic section at offset 0x5ae3168 contains 38 entries: 2025-05-07T20:03:58.7136774Z Tag Type Name/Value 2025-05-07T20:03:58.7138359Z 0x0000000000000001 (NEEDED) Shared library: [libtorch.so] 2025-05-07T20:03:58.7139827Z 0x0000000000000001 (NEEDED) Shared library: [libc10.so] 2025-05-07T20:03:58.7141303Z 0x0000000000000001 (NEEDED) Shared library: [libc10_cuda.so] 2025-05-07T20:03:58.7142810Z 0x0000000000000001 (NEEDED) Shared library: [libnccl.so.2] 2025-05-07T20:03:58.7144176Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cpu.so] 2025-05-07T20:03:58.7144683Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cuda.so] 2025-05-07T20:03:58.7145167Z 0x0000000000000001 (NEEDED) Shared library: [libcudart.so.12] 2025-05-07T20:03:58.7145671Z 0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6] 2025-05-07T20:03:58.7146153Z 0x0000000000000001 (NEEDED) Shared library: [libgomp.so.1] 2025-05-07T20:03:58.7146646Z 0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1] 2025-05-07T20:03:58.7147114Z 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 2025-05-07T20:03:58.7147611Z 0x0000000000000001 (NEEDED) Shared library: [ld-linux-x86-64.so.2] 2025-05-07T20:03:58.7148168Z 0x000000000000000e (SONAME) Library soname: [fbgemm_gpu_experimental_gen_ai.so] 2025-05-07T20:03:58.7148694Z 0x000000000000000c (INIT) 0x15d000 2025-05-07T20:03:58.7149034Z 0x000000000000000d (FINI) 0x5089fc 2025-05-07T20:03:58.7149357Z 0x0000000000000019 (INIT_ARRAY) 0x5ae0d28 2025-05-07T20:03:58.7149719Z 0x000000000000001b (INIT_ARRAYSZ) 1136 (bytes) 2025-05-07T20:03:58.7150279Z 0x000000000000001a (FINI_ARRAY) 0x5ae1198 2025-05-07T20:03:58.7150663Z 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) 2025-05-07T20:03:58.7151085Z 0x000000006ffffef5 (GNU_HASH) 0x238 2025-05-07T20:03:58.7151419Z 0x0000000000000005 (STRTAB) 0x141b8 2025-05-07T20:03:58.7151771Z 0x0000000000000006 (SYMTAB) 0x4120 2025-05-07T20:03:58.7152314Z 0x000000000000000a (STRSZ) 1239382 (bytes) 2025-05-07T20:03:58.7152706Z 0x000000000000000b (SYMENT) 24 (bytes) 2025-05-07T20:03:58.7153060Z 0x0000000000000003 (PLTGOT) 0x5ae4418 2025-05-07T20:03:58.7153449Z 0x0000000000000002 (PLTRELSZ) 44880 (bytes) 2025-05-07T20:03:58.7153822Z 0x0000000000000014 (PLTREL) RELA 2025-05-07T20:03:58.7154155Z 0x0000000000000017 (JMPREL) 0x151300 2025-05-07T20:03:58.7154511Z 0x0000000000000007 (RELA) 0x144190 2025-05-07T20:03:58.7154884Z 0x0000000000000008 (RELASZ) 53616 (bytes) 2025-05-07T20:03:58.7155263Z 0x0000000000000009 (RELAENT) 24 (bytes) 2025-05-07T20:03:58.7155651Z 0x0000000000000018 (BIND_NOW) 2025-05-07T20:03:58.7156010Z 0x000000006ffffffb (FLAGS_1) Flags: NOW 2025-05-07T20:03:58.7156372Z 0x000000006ffffffe (VERNEED) 0x144070 2025-05-07T20:03:58.7156738Z 0x000000006fffffff (VERNEEDNUM) 5 2025-05-07T20:03:58.7157097Z 0x000000006ffffff0 (VERSYM) 0x142b0e 2025-05-07T20:03:58.7157442Z 0x000000006ffffff9 (RELACOUNT) 420 2025-05-07T20:03:58.7157893Z 0x0000000000000000 (NULL) 0x0 2025-05-07T20:03:58.7158095Z 2025-05-07T20:03:58.7158222Z ################################################################################ 2025-05-07T20:03:58.7158466Z 2025-05-07T20:03:58.7158470Z 2025-05-07T20:03:58.7158589Z ################################################################################ 2025-05-07T20:03:58.7159248Z [CHECK] BUILT LIBRARY: ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:58.7159883Z [CHECK] Listing out library size: 2025-05-07T20:03:58.7160505Z + du -h --block-size=1M ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:58.7161016Z 2025-05-07T20:03:58.7161408Z 1 ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:58.7161874Z 2025-05-07T20:03:58.7162403Z [CHECK] Listing out the GLIBC versions referenced by: ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:58.7163699Z + objdump -TC ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so | grep GLIBC_ | sed 's/.*GLIBC_\([.0-9]*\).*/GLIBC_\1/g' | sort -Vu | cat 2025-05-07T20:03:58.7164443Z 2025-05-07T20:03:58.7206893Z GLIBC_2.2.5 2025-05-07T20:03:58.7207579Z GLIBC_2.14 2025-05-07T20:03:58.7208204Z 2025-05-07T20:03:58.7208309Z 2025-05-07T20:03:58.7210034Z [CHECK] Listing out the GLIBCXX versions referenced by: ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:58.7213670Z + objdump -TC ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so | grep GLIBCXX_ | sed 's/.*GLIBCXX_\([.0-9]*\).*/GLIBCXX_\1/g' | sort -Vu | cat 2025-05-07T20:03:58.7214478Z 2025-05-07T20:03:58.7265205Z GLIBCXX_3.4 2025-05-07T20:03:58.7265883Z GLIBCXX_3.4.9 2025-05-07T20:03:58.7266480Z GLIBCXX_3.4.21 2025-05-07T20:03:58.7266858Z 2025-05-07T20:03:58.7267116Z 2025-05-07T20:03:58.7299000Z + nm -gDC ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so > /tmp/tmp.cJwIq9AaCx.symbols.txt 2025-05-07T20:03:58.7300937Z 2025-05-07T20:03:58.7318842Z 2025-05-07T20:03:58.7359285Z [CHECK] Total Number of symbols: 154 2025-05-07T20:03:58.7376966Z [CHECK] Number of fbgemm symbols: 15 2025-05-07T20:03:58.7400967Z + nm -gDCu ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so > /tmp/tmp.n9q7eWYiEV.usymbols.txt 2025-05-07T20:03:58.7401895Z 2025-05-07T20:03:58.7421399Z 2025-05-07T20:03:58.7457245Z [CHECK] Listing out undefined symbols (76 total): 2025-05-07T20:03:58.7480423Z U VTT for std::__cxx11::basic_ostringstream, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:03:58.7482095Z U _Unwind_Resume@GCC_3.0 2025-05-07T20:03:58.7483126Z U __cudaPopCallConfiguration@libcudart.so.12 2025-05-07T20:03:58.7484263Z U __cudaPushCallConfiguration@libcudart.so.12 2025-05-07T20:03:58.7485371Z U __cudaRegisterFatBinary@libcudart.so.12 2025-05-07T20:03:58.7486028Z U __cudaRegisterFatBinaryEnd@libcudart.so.12 2025-05-07T20:03:58.7486421Z U __cudaRegisterFunction@libcudart.so.12 2025-05-07T20:03:58.7486784Z U __cudaRegisterVar@libcudart.so.12 2025-05-07T20:03:58.7487165Z U __cudaUnregisterFatBinary@libcudart.so.12 2025-05-07T20:03:58.7487736Z U __cxa_allocate_exception@CXXABI_1.3 2025-05-07T20:03:58.7488068Z U __cxa_atexit@GLIBC_2.2.5 2025-05-07T20:03:58.7488403Z U __cxa_begin_catch@CXXABI_1.3 2025-05-07T20:03:58.7488733Z U __cxa_free_exception@CXXABI_1.3 2025-05-07T20:03:58.7489074Z U __cxa_throw@CXXABI_1.3 2025-05-07T20:03:58.7489387Z U __gxx_personality_v0@CXXABI_1.3 2025-05-07T20:03:58.7489876Z U at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&) 2025-05-07T20:03:58.7490560Z U at::_ops::to_dtype::call(at::Tensor const&, c10::ScalarType, bool, bool, std::optional) 2025-05-07T20:03:58.7491468Z U at::_ops::zeros::call(c10::ArrayRef, std::optional, std::optional, std::optional, std::optional) 2025-05-07T20:03:58.7492214Z U c10::FloatType::get() 2025-05-07T20:03:58.7492587Z U c10::IValue::reportToTensorTypeError() const 2025-05-07T20:03:58.7493016Z U c10::MessageLogger::MessageLogger(char const*, int, int) 2025-05-07T20:03:58.7493502Z U c10::MessageLogger::~MessageLogger() 2025-05-07T20:03:58.7493891Z U c10::SymFloat::guard_float(char const*, long) const 2025-05-07T20:03:58.7494281Z U c10::TensorType::get() 2025-05-07T20:03:58.7494616Z U c10::UndefinedTensorImpl::_singleton 2025-05-07T20:03:58.7495512Z U c10::detail::infer_schema::make_function_schema(c10::ArrayRef, c10::ArrayRef) 2025-05-07T20:03:58.7496518Z U c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) 2025-05-07T20:03:58.7497390Z U c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&) 2025-05-07T20:03:58.7498368Z U c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, char const*) 2025-05-07T20:03:58.7499426Z U c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string, std::allocator > const&) 2025-05-07T20:03:58.7500334Z U caffe2::TypeMeta::error_unsupported_typemeta(caffe2::TypeMeta) 2025-05-07T20:03:58.7500862Z U cudaGetErrorString@libcudart.so.12 2025-05-07T20:03:58.7501232Z U cudaGetLastError@libcudart.so.12 2025-05-07T20:03:58.7501581Z U cudaLaunchKernel@libcudart.so.12 2025-05-07T20:03:58.7501978Z U float* at::TensorBase::data_ptr() const 2025-05-07T20:03:58.7502415Z U long c10::detail::maybe_wrap_dim_slow(long, long, bool) 2025-05-07T20:03:58.7502870Z U memcpy@GLIBC_2.14 2025-05-07T20:03:58.7503165Z U memmove@GLIBC_2.2.5 2025-05-07T20:03:58.7503475Z U memset@GLIBC_2.2.5 2025-05-07T20:03:58.7503783Z U ncclCommDestroy 2025-05-07T20:03:58.7504070Z U ncclCommInitAll 2025-05-07T20:03:58.7504395Z U operator delete(void*)@GLIBCXX_3.4 2025-05-07T20:03:58.7504746Z U operator new(unsigned long)@GLIBCXX_3.4 2025-05-07T20:03:58.7505350Z U std::__cxx11::basic_ostringstream, std::allocator >::basic_ostringstream() 2025-05-07T20:03:58.7506197Z U std::__cxx11::basic_ostringstream, std::allocator >::~basic_ostringstream()@GLIBCXX_3.4.21 2025-05-07T20:03:58.7506836Z U std::__throw_bad_alloc()@GLIBCXX_3.4 2025-05-07T20:03:58.7507223Z U std::__throw_length_error(char const*)@GLIBCXX_3.4 2025-05-07T20:03:58.7507659Z U std::__throw_logic_error(char const*)@GLIBCXX_3.4 2025-05-07T20:03:58.7508171Z U std::basic_ios >::clear(std::_Ios_Iostate)@GLIBCXX_3.4 2025-05-07T20:03:58.7509131Z U std::basic_ostream >& std::__ostream_insert >(std::basic_ostream >&, char const*, long)@GLIBCXX_3.4.9 2025-05-07T20:03:58.7509941Z U std::ios_base::Init::Init()@GLIBCXX_3.4 2025-05-07T20:03:58.7510322Z U std::ios_base::Init::~Init()@GLIBCXX_3.4 2025-05-07T20:03:58.7510682Z U std::ios_base::~ios_base()@GLIBCXX_3.4 2025-05-07T20:03:58.7511044Z U std::locale::~locale()@GLIBCXX_3.4 2025-05-07T20:03:58.7511457Z U std::ostream& std::ostream::_M_insert(long)@GLIBCXX_3.4.9 2025-05-07T20:03:58.7511910Z U std::ostream::operator<<(int)@GLIBCXX_3.4 2025-05-07T20:03:58.7512608Z U std::runtime_error::runtime_error(std::__cxx11::basic_string, std::allocator > const&)@GLIBCXX_3.4.21 2025-05-07T20:03:58.7513302Z U std::runtime_error::~runtime_error()@GLIBCXX_3.4 2025-05-07T20:03:58.7513730Z U std::terminate()@GLIBCXX_3.4 2025-05-07T20:03:58.7514056Z U strlen@GLIBC_2.2.5 2025-05-07T20:03:58.7514396Z U torch::CppFunction::~CppFunction() 2025-05-07T20:03:58.7515243Z U torch::Library::Library(torch::Library::Kind, std::__cxx11::basic_string, std::allocator >, std::optional, char const*, unsigned int) 2025-05-07T20:03:58.7516412Z U torch::Library::_def(c10::FunctionSchema&&, c10::OperatorName*, std::vector > const&, torch::_RegisterOrVerify) & 2025-05-07T20:03:58.7517268Z U torch::Library::_impl(char const*, torch::CppFunction&&, torch::_RegisterOrVerify) & 2025-05-07T20:03:58.7518030Z U torch::jit::parseSchema(std::__cxx11::basic_string, std::allocator > const&, bool) 2025-05-07T20:03:58.7518648Z U typeinfo for std::runtime_error@GLIBCXX_3.4 2025-05-07T20:03:58.7519065Z U vtable for __cxxabiv1::__class_type_info@CXXABI_1.3 2025-05-07T20:03:58.7519498Z U vtable for __cxxabiv1::__function_type_info@CXXABI_1.3 2025-05-07T20:03:58.7519958Z U vtable for __cxxabiv1::__si_class_type_info@CXXABI_1.3 2025-05-07T20:03:58.7520620Z U vtable for std::__cxx11::basic_stringbuf, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:03:58.7521297Z U vtable for std::basic_streambuf >@GLIBCXX_3.4 2025-05-07T20:03:58.7521772Z w _ITM_deregisterTMCloneTable 2025-05-07T20:03:58.7522104Z w _ITM_registerTMCloneTable 2025-05-07T20:03:58.7522477Z w __cxa_finalize@GLIBC_2.2.5 2025-05-07T20:03:58.7522796Z w __gmon_start__ 2025-05-07T20:03:58.7523076Z w __pthread_key_create 2025-05-07T20:03:58.7523438Z [CHECK] Listing out external shared libraries linked: 2025-05-07T20:03:58.7524060Z + ldd ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:58.7524557Z 2025-05-07T20:03:58.7534509Z linux-vdso.so.1 (0x00007fff40fc7000) 2025-05-07T20:03:58.7534846Z libtorch.so => not found 2025-05-07T20:03:58.7535149Z libc10.so => not found 2025-05-07T20:03:58.7535542Z libnccl.so.2 => not found 2025-05-07T20:03:58.7535808Z libtorch_cpu.so => not found 2025-05-07T20:03:58.7536112Z libtorch_cuda.so => not found 2025-05-07T20:03:58.7536397Z libcudart.so.12 => not found 2025-05-07T20:03:58.7536757Z libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007ff043167000) 2025-05-07T20:03:58.7537189Z libgomp.so.1 => /lib64/libgomp.so.1 (0x00007ff043111000) 2025-05-07T20:03:58.7537672Z libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007ff0430e3000) 2025-05-07T20:03:58.7538062Z libc.so.6 => /lib64/libc.so.6 (0x00007ff042edb000) 2025-05-07T20:03:58.7538458Z libm.so.6 => /lib64/libm.so.6 (0x00007ff042e00000) 2025-05-07T20:03:58.7538849Z /lib64/ld-linux-x86-64.so.2 (0x00007ff043445000) 2025-05-07T20:03:58.7539095Z 2025-05-07T20:03:58.7539210Z [CHECK] Displaying ELF information: 2025-05-07T20:03:58.7539825Z + readelf -d ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:58.7540319Z 2025-05-07T20:03:58.7577704Z 2025-05-07T20:03:58.7578065Z Dynamic section at offset 0x71978 contains 36 entries: 2025-05-07T20:03:58.7578492Z Tag Type Name/Value 2025-05-07T20:03:58.7578959Z 0x0000000000000001 (NEEDED) Shared library: [libtorch.so] 2025-05-07T20:03:58.7579462Z 0x0000000000000001 (NEEDED) Shared library: [libc10.so] 2025-05-07T20:03:58.7579992Z 0x0000000000000001 (NEEDED) Shared library: [libnccl.so.2] 2025-05-07T20:03:58.7580525Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cpu.so] 2025-05-07T20:03:58.7581045Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cuda.so] 2025-05-07T20:03:58.7581737Z 0x0000000000000001 (NEEDED) Shared library: [libcudart.so.12] 2025-05-07T20:03:58.7582263Z 0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6] 2025-05-07T20:03:58.7582795Z 0x0000000000000001 (NEEDED) Shared library: [libgomp.so.1] 2025-05-07T20:03:58.7583307Z 0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1] 2025-05-07T20:03:58.7583823Z 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 2025-05-07T20:03:58.7584412Z 0x000000000000000e (SONAME) Library soname: [fbgemm_gpu_experimental_example_py.so] 2025-05-07T20:03:58.7584913Z 0x000000000000000c (INIT) 0x5000 2025-05-07T20:03:58.7585269Z 0x000000000000000d (FINI) 0x98dc 2025-05-07T20:03:58.7585606Z 0x0000000000000019 (INIT_ARRAY) 0x727d0 2025-05-07T20:03:58.7585966Z 0x000000000000001b (INIT_ARRAYSZ) 32 (bytes) 2025-05-07T20:03:58.7586317Z 0x000000000000001a (FINI_ARRAY) 0x727f0 2025-05-07T20:03:58.7586676Z 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) 2025-05-07T20:03:58.7587038Z 0x000000006ffffef5 (GNU_HASH) 0x200 2025-05-07T20:03:58.7587371Z 0x0000000000000005 (STRTAB) 0x1448 2025-05-07T20:03:58.7587772Z 0x0000000000000006 (SYMTAB) 0x5c0 2025-05-07T20:03:58.7588125Z 0x000000000000000a (STRSZ) 9973 (bytes) 2025-05-07T20:03:58.7588502Z 0x000000000000000b (SYMENT) 24 (bytes) 2025-05-07T20:03:58.7588859Z 0x0000000000000003 (PLTGOT) 0x72c08 2025-05-07T20:03:58.7589242Z 0x0000000000000002 (PLTRELSZ) 2208 (bytes) 2025-05-07T20:03:58.7589593Z 0x0000000000000014 (PLTREL) RELA 2025-05-07T20:03:58.7589945Z 0x0000000000000017 (JMPREL) 0x4530 2025-05-07T20:03:58.7590378Z 0x0000000000000007 (RELA) 0x3d38 2025-05-07T20:03:58.7590730Z 0x0000000000000008 (RELASZ) 2040 (bytes) 2025-05-07T20:03:58.7591121Z 0x0000000000000009 (RELAENT) 24 (bytes) 2025-05-07T20:03:58.7591454Z 0x0000000000000018 (BIND_NOW) 2025-05-07T20:03:58.7591805Z 0x000000006ffffffb (FLAGS_1) Flags: NOW 2025-05-07T20:03:58.7592154Z 0x000000006ffffffe (VERNEED) 0x3c78 2025-05-07T20:03:58.7592506Z 0x000000006fffffff (VERNEEDNUM) 4 2025-05-07T20:03:58.7592831Z 0x000000006ffffff0 (VERSYM) 0x3b3e 2025-05-07T20:03:58.7593172Z 0x000000006ffffff9 (RELACOUNT) 7 2025-05-07T20:03:58.7593498Z 0x0000000000000000 (NULL) 0x0 2025-05-07T20:03:58.7593714Z 2025-05-07T20:03:58.7593832Z ################################################################################ 2025-05-07T20:03:58.7594080Z 2025-05-07T20:03:58.7594134Z 2025-05-07T20:03:58.7594254Z ################################################################################ 2025-05-07T20:03:58.7594691Z [CHECK] BUILT LIBRARY: ./_skbuild/linux-x86_64-3.13/cmake-build/asmjit.so 2025-05-07T20:03:58.7595134Z [CHECK] Listing out library size: 2025-05-07T20:03:58.7595554Z + du -h --block-size=1M ./_skbuild/linux-x86_64-3.13/cmake-build/asmjit.so 2025-05-07T20:03:58.7595867Z 2025-05-07T20:03:58.7596013Z 1 ./_skbuild/linux-x86_64-3.13/cmake-build/asmjit.so 2025-05-07T20:03:58.7596888Z 2025-05-07T20:03:58.7598372Z [CHECK] Listing out the GLIBC versions referenced by: ./_skbuild/linux-x86_64-3.13/cmake-build/asmjit.so 2025-05-07T20:03:58.7599285Z + objdump -TC ./_skbuild/linux-x86_64-3.13/cmake-build/asmjit.so | grep GLIBC_ | sed 's/.*GLIBC_\([.0-9]*\).*/GLIBC_\1/g' | sort -Vu | cat 2025-05-07T20:03:58.7599845Z 2025-05-07T20:03:58.7667589Z GLIBC_2.2.5 2025-05-07T20:03:58.7667855Z GLIBC_2.14 2025-05-07T20:03:58.7670647Z 2025-05-07T20:03:58.7670652Z 2025-05-07T20:03:58.7671031Z [CHECK] Listing out the GLIBCXX versions referenced by: ./_skbuild/linux-x86_64-3.13/cmake-build/asmjit.so 2025-05-07T20:03:58.7671964Z + objdump -TC ./_skbuild/linux-x86_64-3.13/cmake-build/asmjit.so | grep GLIBCXX_ | sed 's/.*GLIBCXX_\([.0-9]*\).*/GLIBCXX_\1/g' | sort -Vu | cat 2025-05-07T20:03:58.7672662Z 2025-05-07T20:03:58.7737936Z GLIBCXX_3.4 2025-05-07T20:03:58.7739524Z 2025-05-07T20:03:58.7739812Z 2025-05-07T20:03:58.7764943Z + nm -gDC ./_skbuild/linux-x86_64-3.13/cmake-build/asmjit.so > /tmp/tmp.bAVrB4i9du.symbols.txt 2025-05-07T20:03:58.7765390Z 2025-05-07T20:03:58.7800044Z 2025-05-07T20:03:58.7839613Z [CHECK] Total Number of symbols: 841 2025-05-07T20:03:58.7856955Z [CHECK] Number of fbgemm symbols: 0 2025-05-07T20:03:58.7877568Z + nm -gDCu ./_skbuild/linux-x86_64-3.13/cmake-build/asmjit.so > /tmp/tmp.0vlm18d05r.usymbols.txt 2025-05-07T20:03:58.7878829Z 2025-05-07T20:03:58.7897854Z 2025-05-07T20:03:58.7927335Z [CHECK] Listing out undefined symbols (51 total): 2025-05-07T20:03:58.7940209Z U _Unwind_Resume@GCC_3.0 2025-05-07T20:03:58.7941236Z U __cxa_begin_catch@CXXABI_1.3 2025-05-07T20:03:58.7942165Z U __cxa_guard_acquire@CXXABI_1.3 2025-05-07T20:03:58.7943107Z U __cxa_guard_release@CXXABI_1.3 2025-05-07T20:03:58.7944134Z U __errno_location@GLIBC_2.2.5 2025-05-07T20:03:58.7944463Z U __gxx_personality_v0@CXXABI_1.3 2025-05-07T20:03:58.7944796Z U abort@GLIBC_2.2.5 2025-05-07T20:03:58.7945078Z U bcmp@GLIBC_2.2.5 2025-05-07T20:03:58.7945526Z U close@GLIBC_2.2.5 2025-05-07T20:03:58.7945813Z U fputs@GLIBC_2.2.5 2025-05-07T20:03:58.7946111Z U free@GLIBC_2.2.5 2025-05-07T20:03:58.7946399Z U ftruncate64@GLIBC_2.2.5 2025-05-07T20:03:58.7946720Z U fwrite@GLIBC_2.2.5 2025-05-07T20:03:58.7947038Z U getenv@GLIBC_2.2.5 2025-05-07T20:03:58.7947330Z U getpagesize@GLIBC_2.2.5 2025-05-07T20:03:58.7947828Z U madvise@GLIBC_2.2.5 2025-05-07T20:03:58.7948120Z U malloc@GLIBC_2.2.5 2025-05-07T20:03:58.7948424Z U memcmp@GLIBC_2.2.5 2025-05-07T20:03:58.7948814Z U memcpy@GLIBC_2.14 2025-05-07T20:03:58.7949103Z U memmove@GLIBC_2.2.5 2025-05-07T20:03:58.7949377Z U memset@GLIBC_2.2.5 2025-05-07T20:03:58.7949654Z U mmap@GLIBC_2.2.5 2025-05-07T20:03:58.7949920Z U mprotect@GLIBC_2.2.5 2025-05-07T20:03:58.7950214Z U munmap@GLIBC_2.2.5 2025-05-07T20:03:58.7950495Z U open64@GLIBC_2.2.5 2025-05-07T20:03:58.7950780Z U operator delete(void*)@GLIBCXX_3.4 2025-05-07T20:03:58.7951123Z U pthread_mutex_destroy@GLIBC_2.2.5 2025-05-07T20:03:58.7951445Z U pthread_mutex_lock@GLIBC_2.2.5 2025-05-07T20:03:58.7951772Z U pthread_mutex_unlock@GLIBC_2.2.5 2025-05-07T20:03:58.7952119Z U read@GLIBC_2.2.5 2025-05-07T20:03:58.7952406Z U realloc@GLIBC_2.2.5 2025-05-07T20:03:58.7952683Z U shm_open@GLIBC_2.2.5 2025-05-07T20:03:58.7952979Z U shm_unlink@GLIBC_2.2.5 2025-05-07T20:03:58.7953280Z U snprintf@GLIBC_2.2.5 2025-05-07T20:03:58.7953587Z U std::terminate()@GLIBCXX_3.4 2025-05-07T20:03:58.7953900Z U stderr@GLIBC_2.2.5 2025-05-07T20:03:58.7954171Z U strcmp@GLIBC_2.2.5 2025-05-07T20:03:58.7954457Z U strlen@GLIBC_2.2.5 2025-05-07T20:03:58.7954725Z U strtol@GLIBC_2.2.5 2025-05-07T20:03:58.7955010Z U syscall@GLIBC_2.2.5 2025-05-07T20:03:58.7955279Z U sysconf@GLIBC_2.2.5 2025-05-07T20:03:58.7955558Z U uname@GLIBC_2.2.5 2025-05-07T20:03:58.7955834Z U unlink@GLIBC_2.2.5 2025-05-07T20:03:58.7956100Z U vsnprintf@GLIBC_2.2.5 2025-05-07T20:03:58.7956444Z U vtable for __cxxabiv1::__class_type_info@CXXABI_1.3 2025-05-07T20:03:58.7956840Z U vtable for __cxxabiv1::__si_class_type_info@CXXABI_1.3 2025-05-07T20:03:58.7957258Z U vtable for __cxxabiv1::__vmi_class_type_info@CXXABI_1.3 2025-05-07T20:03:58.7957662Z w _ITM_deregisterTMCloneTable 2025-05-07T20:03:58.7957981Z w _ITM_registerTMCloneTable 2025-05-07T20:03:58.7958289Z w __cxa_finalize@GLIBC_2.2.5 2025-05-07T20:03:58.7958575Z w __gmon_start__ 2025-05-07T20:03:58.7958900Z [CHECK] Listing out external shared libraries linked: 2025-05-07T20:03:58.7959277Z + ldd ./_skbuild/linux-x86_64-3.13/cmake-build/asmjit.so 2025-05-07T20:03:58.7959530Z 2025-05-07T20:03:58.7992341Z linux-vdso.so.1 (0x00007ffe80fe8000) 2025-05-07T20:03:58.7993303Z libtorch_cpu.so => not found 2025-05-07T20:03:58.7994082Z libtorch_cuda.so => not found 2025-05-07T20:03:58.7994381Z libtorch.so => not found 2025-05-07T20:03:58.7994703Z libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fd60e747000) 2025-05-07T20:03:58.7995153Z libgomp.so.1 => /lib64/libgomp.so.1 (0x00007fd60e6f1000) 2025-05-07T20:03:58.7995548Z librt.so.1 => /lib64/librt.so.1 (0x00007fd60e6ea000) 2025-05-07T20:03:58.7995961Z libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fd60e6bc000) 2025-05-07T20:03:58.7996392Z libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fd60e6b7000) 2025-05-07T20:03:58.7996816Z libc.so.6 => /lib64/libc.so.6 (0x00007fd60e4af000) 2025-05-07T20:03:58.7997190Z libm.so.6 => /lib64/libm.so.6 (0x00007fd60e3d4000) 2025-05-07T20:03:58.7997813Z /lib64/ld-linux-x86-64.so.2 (0x00007fd60ea27000) 2025-05-07T20:03:58.7998176Z 2025-05-07T20:03:58.7998302Z [CHECK] Displaying ELF information: 2025-05-07T20:03:58.7998661Z + readelf -d ./_skbuild/linux-x86_64-3.13/cmake-build/asmjit.so 2025-05-07T20:03:58.7998938Z 2025-05-07T20:03:58.8035962Z 2025-05-07T20:03:58.8036712Z Dynamic section at offset 0x74dd0 contains 35 entries: 2025-05-07T20:03:58.8037917Z Tag Type Name/Value 2025-05-07T20:03:58.8039443Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cpu.so] 2025-05-07T20:03:58.8041008Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cuda.so] 2025-05-07T20:03:58.8042561Z 0x0000000000000001 (NEEDED) Shared library: [libtorch.so] 2025-05-07T20:03:58.8043215Z 0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6] 2025-05-07T20:03:58.8043733Z 0x0000000000000001 (NEEDED) Shared library: [libgomp.so.1] 2025-05-07T20:03:58.8044229Z 0x0000000000000001 (NEEDED) Shared library: [librt.so.1] 2025-05-07T20:03:58.8044733Z 0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1] 2025-05-07T20:03:58.8045230Z 0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0] 2025-05-07T20:03:58.8045740Z 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 2025-05-07T20:03:58.8046294Z 0x000000000000000e (SONAME) Library soname: [asmjit.so] 2025-05-07T20:03:58.8046719Z 0x000000000000000c (INIT) 0x19000 2025-05-07T20:03:58.8047064Z 0x000000000000000d (FINI) 0x56a1c 2025-05-07T20:03:58.8047386Z 0x0000000000000019 (INIT_ARRAY) 0x74ff8 2025-05-07T20:03:58.8047742Z 0x000000000000001b (INIT_ARRAYSZ) 8 (bytes) 2025-05-07T20:03:58.8048080Z 0x000000000000001a (FINI_ARRAY) 0x75000 2025-05-07T20:03:58.8048605Z 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) 2025-05-07T20:03:58.8048955Z 0x000000006ffffef5 (GNU_HASH) 0x200 2025-05-07T20:03:58.8049599Z 0x0000000000000005 (STRTAB) 0x7120 2025-05-07T20:03:58.8049926Z 0x0000000000000006 (SYMTAB) 0x2230 2025-05-07T20:03:58.8050286Z 0x000000000000000a (STRSZ) 48790 (bytes) 2025-05-07T20:03:58.8050656Z 0x000000000000000b (SYMENT) 24 (bytes) 2025-05-07T20:03:58.8050999Z 0x0000000000000003 (PLTGOT) 0x76050 2025-05-07T20:03:58.8051369Z 0x0000000000000002 (PLTRELSZ) 8472 (bytes) 2025-05-07T20:03:58.8051719Z 0x0000000000000014 (PLTREL) RELA 2025-05-07T20:03:58.8052056Z 0x0000000000000017 (JMPREL) 0x16a58 2025-05-07T20:03:58.8052442Z 0x0000000000000007 (RELA) 0x13710 2025-05-07T20:03:58.8052810Z 0x0000000000000008 (RELASZ) 13128 (bytes) 2025-05-07T20:03:58.8053185Z 0x0000000000000009 (RELAENT) 24 (bytes) 2025-05-07T20:03:58.8053513Z 0x0000000000000018 (BIND_NOW) 2025-05-07T20:03:58.8053860Z 0x000000006ffffffb (FLAGS_1) Flags: NOW 2025-05-07T20:03:58.8054207Z 0x000000006ffffffe (VERNEED) 0x13650 2025-05-07T20:03:58.8054560Z 0x000000006fffffff (VERNEEDNUM) 5 2025-05-07T20:03:58.8054888Z 0x000000006ffffff0 (VERSYM) 0x12fb6 2025-05-07T20:03:58.8055367Z 0x000000006ffffff9 (RELACOUNT) 3 2025-05-07T20:03:58.8055690Z 0x0000000000000000 (NULL) 0x0 2025-05-07T20:03:58.8055917Z 2025-05-07T20:03:58.8056036Z ################################################################################ 2025-05-07T20:03:58.8056270Z 2025-05-07T20:03:58.8056274Z 2025-05-07T20:03:58.8056463Z ################################################################################ 2025-05-07T20:03:58.8056900Z [CHECK] BUILT LIBRARY: ./_skbuild/linux-x86_64-3.13/cmake-build/fbgemm.so 2025-05-07T20:03:58.8057341Z [CHECK] Listing out library size: 2025-05-07T20:03:58.8057735Z + du -h --block-size=1M ./_skbuild/linux-x86_64-3.13/cmake-build/fbgemm.so 2025-05-07T20:03:58.8058060Z 2025-05-07T20:03:58.8058272Z 6 ./_skbuild/linux-x86_64-3.13/cmake-build/fbgemm.so 2025-05-07T20:03:58.8058517Z 2025-05-07T20:03:58.8058859Z [CHECK] Listing out the GLIBC versions referenced by: ./_skbuild/linux-x86_64-3.13/cmake-build/fbgemm.so 2025-05-07T20:03:58.8059721Z + objdump -TC ./_skbuild/linux-x86_64-3.13/cmake-build/fbgemm.so | grep GLIBC_ | sed 's/.*GLIBC_\([.0-9]*\).*/GLIBC_\1/g' | sort -Vu | cat 2025-05-07T20:03:58.8060278Z 2025-05-07T20:03:58.8331826Z GLIBC_2.2.5 2025-05-07T20:03:58.8332743Z GLIBC_2.3 2025-05-07T20:03:58.8332970Z GLIBC_2.14 2025-05-07T20:03:58.8333094Z 2025-05-07T20:03:58.8333212Z 2025-05-07T20:03:58.8333596Z [CHECK] Listing out the GLIBCXX versions referenced by: ./_skbuild/linux-x86_64-3.13/cmake-build/fbgemm.so 2025-05-07T20:03:58.8334490Z + objdump -TC ./_skbuild/linux-x86_64-3.13/cmake-build/fbgemm.so | grep GLIBCXX_ | sed 's/.*GLIBCXX_\([.0-9]*\).*/GLIBCXX_\1/g' | sort -Vu | cat 2025-05-07T20:03:58.8335052Z 2025-05-07T20:03:58.8606911Z GLIBCXX_3.4 2025-05-07T20:03:58.8607554Z GLIBCXX_3.4.9 2025-05-07T20:03:58.8608240Z GLIBCXX_3.4.11 2025-05-07T20:03:58.8608865Z GLIBCXX_3.4.14 2025-05-07T20:03:58.8609442Z GLIBCXX_3.4.15 2025-05-07T20:03:58.8610031Z GLIBCXX_3.4.18 2025-05-07T20:03:58.8610603Z GLIBCXX_3.4.21 2025-05-07T20:03:58.8610982Z 2025-05-07T20:03:58.8610997Z 2025-05-07T20:03:58.8627107Z + nm -gDC ./_skbuild/linux-x86_64-3.13/cmake-build/fbgemm.so > /tmp/tmp.pOEzrUO85C.symbols.txt 2025-05-07T20:03:58.8628705Z 2025-05-07T20:03:58.8862653Z 2025-05-07T20:03:58.8888606Z [CHECK] Total Number of symbols: 4951 2025-05-07T20:03:58.8908038Z [CHECK] Number of fbgemm symbols: 3554 2025-05-07T20:03:58.8928736Z + nm -gDCu ./_skbuild/linux-x86_64-3.13/cmake-build/fbgemm.so > /tmp/tmp.UHe39dnuMK.usymbols.txt 2025-05-07T20:03:58.8930010Z 2025-05-07T20:03:58.8956161Z 2025-05-07T20:03:58.8981598Z [CHECK] Listing out undefined symbols (133 total): 2025-05-07T20:03:58.8997471Z U _Unwind_Resume@GCC_3.0 2025-05-07T20:03:58.8998414Z U __cxa_allocate_exception@CXXABI_1.3 2025-05-07T20:03:58.8998920Z U __cxa_atexit@GLIBC_2.2.5 2025-05-07T20:03:58.8999269Z U __cxa_begin_catch@CXXABI_1.3 2025-05-07T20:03:58.8999614Z U __cxa_end_catch@CXXABI_1.3 2025-05-07T20:03:58.8999976Z U __cxa_free_exception@CXXABI_1.3 2025-05-07T20:03:58.9000311Z U __cxa_guard_abort@CXXABI_1.3 2025-05-07T20:03:58.9000655Z U __cxa_guard_acquire@CXXABI_1.3 2025-05-07T20:03:58.9000991Z U __cxa_guard_release@CXXABI_1.3 2025-05-07T20:03:58.9001361Z U __cxa_init_primary_exception@CXXABI_1.3.11 2025-05-07T20:03:58.9001710Z U __cxa_rethrow@CXXABI_1.3 2025-05-07T20:03:58.9002358Z U __cxa_thread_atexit@CXXABI_1.3.7 2025-05-07T20:03:58.9002709Z U __cxa_throw@CXXABI_1.3 2025-05-07T20:03:58.9003004Z U __extendhfsf2@GCC_12.0.0 2025-05-07T20:03:58.9003332Z U __gxx_personality_v0@CXXABI_1.3 2025-05-07T20:03:58.9003671Z U __once_proxy@GLIBCXX_3.4.11 2025-05-07T20:03:58.9003990Z U __tls_get_addr@GLIBC_2.3 2025-05-07T20:03:58.9004290Z U __truncsfhf2@GCC_12.0.0 2025-05-07T20:03:58.9004604Z U abort@GLIBC_2.2.5 2025-05-07T20:03:58.9005090Z U asmjit::_abi_1_13::BaseEmitter::_emitI(unsigned int, asmjit::_abi_1_13::Operand_ const&) 2025-05-07T20:03:58.9005920Z U asmjit::_abi_1_13::BaseEmitter::_emitI(unsigned int, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&) 2025-05-07T20:03:58.9006844Z U asmjit::_abi_1_13::BaseEmitter::_emitI(unsigned int, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&) 2025-05-07T20:03:58.9008027Z U asmjit::_abi_1_13::BaseEmitter::_emitI(unsigned int, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&) 2025-05-07T20:03:58.9009090Z U asmjit::_abi_1_13::BaseEmitter::emitArgsAssignment(asmjit::_abi_1_13::FuncFrame const&, asmjit::_abi_1_13::FuncArgsAssignment const&) 2025-05-07T20:03:58.9009852Z U asmjit::_abi_1_13::BaseEmitter::emitEpilog(asmjit::_abi_1_13::FuncFrame const&) 2025-05-07T20:03:58.9010426Z U asmjit::_abi_1_13::BaseEmitter::emitProlog(asmjit::_abi_1_13::FuncFrame const&) 2025-05-07T20:03:58.9012513Z U asmjit::_abi_1_13::CodeHolder::CodeHolder(asmjit::_abi_1_13::Support::Temporary const*) 2025-05-07T20:03:58.9013145Z U asmjit::_abi_1_13::CodeHolder::init(asmjit::_abi_1_13::Environment const&, unsigned long) 2025-05-07T20:03:58.9013630Z U asmjit::_abi_1_13::CodeHolder::~CodeHolder() 2025-05-07T20:03:58.9014159Z U asmjit::_abi_1_13::FuncArgsAssignment::updateFuncFrame(asmjit::_abi_1_13::FuncFrame&) const 2025-05-07T20:03:58.9014888Z U asmjit::_abi_1_13::FuncDetail::init(asmjit::_abi_1_13::FuncSignature const&, asmjit::_abi_1_13::Environment const&) 2025-05-07T20:03:58.9015572Z U asmjit::_abi_1_13::FuncFrame::finalize() 2025-05-07T20:03:58.9016217Z U asmjit::_abi_1_13::FuncFrame::init(asmjit::_abi_1_13::FuncDetail const&) 2025-05-07T20:03:58.9016912Z U asmjit::_abi_1_13::JitRuntime::JitRuntime(asmjit::_abi_1_13::JitAllocator::CreateParams const*) 2025-05-07T20:03:58.9017561Z U asmjit::_abi_1_13::JitRuntime::~JitRuntime() 2025-05-07T20:03:58.9018047Z U asmjit::_abi_1_13::x86::Assembler::Assembler(asmjit::_abi_1_13::CodeHolder*) 2025-05-07T20:03:58.9018529Z U asmjit::_abi_1_13::x86::Assembler::~Assembler() 2025-05-07T20:03:58.9018903Z U bcmp@GLIBC_2.2.5 2025-05-07T20:03:58.9019193Z U ceilf@GLIBC_2.2.5 2025-05-07T20:03:58.9019516Z U cpuinfo_get_packages 2025-05-07T20:03:58.9019831Z U cpuinfo_get_packages_count 2025-05-07T20:03:58.9020161Z U cpuinfo_initialize 2025-05-07T20:03:58.9020469Z U cpuinfo_isa 2025-05-07T20:03:58.9020737Z U floor@GLIBC_2.2.5 2025-05-07T20:03:58.9021039Z U fma@GLIBC_2.2.5 2025-05-07T20:03:58.9021319Z U fmaf@GLIBC_2.2.5 2025-05-07T20:03:58.9021613Z U free@GLIBC_2.2.5 2025-05-07T20:03:58.9022015Z U fwrite@GLIBC_2.2.5 2025-05-07T20:03:58.9022305Z U getenv@GLIBC_2.2.5 2025-05-07T20:03:58.9022577Z U ldexp@GLIBC_2.2.5 2025-05-07T20:03:58.9022856Z U log2@GLIBC_2.2.5 2025-05-07T20:03:58.9023154Z U log2f@GLIBC_2.2.5 2025-05-07T20:03:58.9023438Z U lrintf@GLIBC_2.2.5 2025-05-07T20:03:58.9023728Z U memcpy@GLIBC_2.14 2025-05-07T20:03:58.9023996Z U memmove@GLIBC_2.2.5 2025-05-07T20:03:58.9024289Z U memset@GLIBC_2.2.5 2025-05-07T20:03:58.9024562Z U nearbyint@GLIBC_2.2.5 2025-05-07T20:03:58.9024865Z U nearbyintf@GLIBC_2.2.5 2025-05-07T20:03:58.9025167Z U operator delete(void*)@GLIBCXX_3.4 2025-05-07T20:03:58.9025506Z U operator delete[](void*)@GLIBCXX_3.4 2025-05-07T20:03:58.9025844Z U operator new(unsigned long)@GLIBCXX_3.4 2025-05-07T20:03:58.9026202Z U operator new[](unsigned long)@GLIBCXX_3.4 2025-05-07T20:03:58.9026546Z U posix_memalign@GLIBC_2.2.5 2025-05-07T20:03:58.9026835Z U sqrtf@GLIBC_2.2.5 2025-05-07T20:03:58.9027232Z U std::_Hash_bytes(void const*, unsigned long, unsigned long)@CXXABI_1.3.5 2025-05-07T20:03:58.9027702Z U std::_Rb_tree_decrement(std::_Rb_tree_node_base*)@GLIBCXX_3.4 2025-05-07T20:03:58.9028143Z U std::_Rb_tree_increment(std::_Rb_tree_node_base*)@GLIBCXX_3.4 2025-05-07T20:03:58.9028816Z U std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)@GLIBCXX_3.4 2025-05-07T20:03:58.9029520Z U std::__atomic_futex_unsigned_base::_M_futex_notify_all(unsigned int*)@GLIBCXX_3.4.21 2025-05-07T20:03:58.9030525Z U std::__atomic_futex_unsigned_base::_M_futex_wait_until(unsigned int*, unsigned int, bool, std::chrono::duration >, std::chrono::duration >)@GLIBCXX_3.4.21 2025-05-07T20:03:58.9031679Z U std::__detail::_Prime_rehash_policy::_M_need_rehash(unsigned long, unsigned long, unsigned long) const@GLIBCXX_3.4.18 2025-05-07T20:03:58.9032365Z U std::__detail::_Prime_rehash_policy::_M_next_bkt(unsigned long) const@GLIBCXX_3.4.18 2025-05-07T20:03:58.9032861Z U std::__exception_ptr::exception_ptr::_M_addref() 2025-05-07T20:03:58.9033236Z U std::__exception_ptr::exception_ptr::_M_release() 2025-05-07T20:03:58.9033697Z U std::__exception_ptr::exception_ptr::exception_ptr(void*)@CXXABI_1.3.11 2025-05-07T20:03:58.9034194Z U std::__future_base::_Result_base::_Result_base()@GLIBCXX_3.4.15 2025-05-07T20:03:58.9034640Z U std::__future_base::_Result_base::~_Result_base()@GLIBCXX_3.4.15 2025-05-07T20:03:58.9035044Z U std::__once_call@GLIBCXX_3.4.11 2025-05-07T20:03:58.9035450Z U std::__once_callable@GLIBCXX_3.4.11 2025-05-07T20:03:58.9035800Z U std::__throw_bad_alloc()@GLIBCXX_3.4 2025-05-07T20:03:58.9036155Z U std::__throw_bad_array_new_length() 2025-05-07T20:03:58.9036474Z U std::__throw_bad_cast()@GLIBCXX_3.4 2025-05-07T20:03:58.9036831Z U std::__throw_bad_function_call()@GLIBCXX_3.4.14 2025-05-07T20:03:58.9037194Z U std::__throw_future_error(int)@GLIBCXX_3.4.14 2025-05-07T20:03:58.9037581Z U std::__throw_length_error(char const*)@GLIBCXX_3.4 2025-05-07T20:03:58.9037948Z U std::__throw_system_error(int)@GLIBCXX_3.4.11 2025-05-07T20:03:58.9038498Z U std::bad_alloc::~bad_alloc()@GLIBCXX_3.4 2025-05-07T20:03:58.9039316Z U std::basic_ostream >& std::__ostream_insert >(std::basic_ostream >&, char const*, long)@GLIBCXX_3.4.9 2025-05-07T20:03:58.9040091Z U std::cerr@GLIBCXX_3.4 2025-05-07T20:03:58.9040420Z U std::cout@GLIBCXX_3.4 2025-05-07T20:03:58.9040775Z U std::ctype::_M_widen_init() const@GLIBCXX_3.4.11 2025-05-07T20:03:58.9041392Z U std::future_category()@GLIBCXX_3.4.15 2025-05-07T20:03:58.9041790Z U std::future_error::~future_error()@GLIBCXX_3.4.14 2025-05-07T20:03:58.9042174Z U std::ios_base::Init::Init()@GLIBCXX_3.4 2025-05-07T20:03:58.9042559Z U std::ios_base::Init::~Init()@GLIBCXX_3.4 2025-05-07T20:03:58.9043216Z U std::logic_error::logic_error(std::__cxx11::basic_string, std::allocator > const&)@GLIBCXX_3.4.21 2025-05-07T20:03:58.9043967Z U std::logic_error::logic_error(std::logic_error const&)@GLIBCXX_3.4.21 2025-05-07T20:03:58.9044515Z U std::ostream& std::ostream::_M_insert(double)@GLIBCXX_3.4.9 2025-05-07T20:03:58.9045024Z U std::ostream& std::ostream::_M_insert(long)@GLIBCXX_3.4.9 2025-05-07T20:03:58.9045592Z U std::ostream& std::ostream::_M_insert(unsigned long)@GLIBCXX_3.4.9 2025-05-07T20:03:58.9046073Z U std::ostream::flush()@GLIBCXX_3.4 2025-05-07T20:03:58.9046453Z U std::ostream::operator<<(int)@GLIBCXX_3.4 2025-05-07T20:03:58.9046840Z U std::ostream::put(char)@GLIBCXX_3.4 2025-05-07T20:03:58.9047284Z U std::rethrow_exception(std::__exception_ptr::exception_ptr)@CXXABI_1.3.3 2025-05-07T20:03:58.9047867Z U std::runtime_error::runtime_error(char const*)@GLIBCXX_3.4.21 2025-05-07T20:03:58.9048308Z U std::runtime_error::~runtime_error()@GLIBCXX_3.4 2025-05-07T20:03:58.9048689Z U std::terminate()@GLIBCXX_3.4 2025-05-07T20:03:58.9049022Z U stderr@GLIBC_2.2.5 2025-05-07T20:03:58.9049312Z U strcmp@GLIBC_2.2.5 2025-05-07T20:03:58.9049629Z U strlen@GLIBC_2.2.5 2025-05-07T20:03:58.9049948Z U strstr@GLIBC_2.2.5 2025-05-07T20:03:58.9050249Z U tolower@GLIBC_2.2.5 2025-05-07T20:03:58.9050548Z U toupper@GLIBC_2.2.5 2025-05-07T20:03:58.9050951Z U typeinfo for std::__future_base::_Result_base@GLIBCXX_3.4.15 2025-05-07T20:03:58.9051377Z U typeinfo for std::bad_alloc@GLIBCXX_3.4 2025-05-07T20:03:58.9051768Z U typeinfo for std::future_error@GLIBCXX_3.4.14 2025-05-07T20:03:58.9052174Z U typeinfo for std::runtime_error@GLIBCXX_3.4 2025-05-07T20:03:58.9052568Z U vtable for __cxxabiv1::__class_type_info@CXXABI_1.3 2025-05-07T20:03:58.9053009Z U vtable for __cxxabiv1::__si_class_type_info@CXXABI_1.3 2025-05-07T20:03:58.9053406Z U vtable for std::bad_alloc@GLIBCXX_3.4 2025-05-07T20:03:58.9053785Z U vtable for std::future_error@GLIBCXX_3.4.14 2025-05-07T20:03:58.9054204Z w _ITM_deregisterTMCloneTable 2025-05-07T20:03:58.9054534Z w _ITM_registerTMCloneTable 2025-05-07T20:03:58.9054868Z w __cxa_finalize@GLIBC_2.2.5 2025-05-07T20:03:58.9055173Z w __gmon_start__ 2025-05-07T20:03:58.9055588Z w __pthread_key_create 2025-05-07T20:03:58.9055912Z w pthread_mutex_lock@GLIBC_2.2.5 2025-05-07T20:03:58.9056291Z w pthread_mutex_unlock@GLIBC_2.2.5 2025-05-07T20:03:58.9056611Z w pthread_once 2025-05-07T20:03:58.9056919Z w pthread_rwlock_rdlock 2025-05-07T20:03:58.9057250Z w pthread_rwlock_unlock 2025-05-07T20:03:58.9057554Z w pthread_rwlock_wrlock 2025-05-07T20:03:58.9057883Z w pthread_self@GLIBC_2.2.5 2025-05-07T20:03:58.9058244Z [CHECK] Listing out external shared libraries linked: 2025-05-07T20:03:58.9058665Z + ldd ./_skbuild/linux-x86_64-3.13/cmake-build/fbgemm.so 2025-05-07T20:03:58.9058929Z 2025-05-07T20:03:58.9059068Z linux-vdso.so.1 (0x00007ffd3db11000) 2025-05-07T20:03:58.9059387Z libc10.so => not found 2025-05-07T20:03:58.9059978Z asmjit.so => /__w/FBGEMM/FBGEMM/fbgemm_gpu/./_skbuild/linux-x86_64-3.13/cmake-build/asmjit.so (0x00007fa4a900e000) 2025-05-07T20:03:58.9060558Z libtorch.so => not found 2025-05-07T20:03:58.9060844Z libtorch_cpu.so => not found 2025-05-07T20:03:58.9061123Z libtorch_cuda.so => not found 2025-05-07T20:03:58.9061479Z libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fa4a879c000) 2025-05-07T20:03:58.9061884Z libm.so.6 => /lib64/libm.so.6 (0x00007fa4a86c1000) 2025-05-07T20:03:58.9062290Z libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fa4a8fde000) 2025-05-07T20:03:58.9062681Z libc.so.6 => /lib64/libc.so.6 (0x00007fa4a84b9000) 2025-05-07T20:03:58.9063077Z /lib64/ld-linux-x86-64.so.2 (0x00007fa4a908a000) 2025-05-07T20:03:58.9063434Z libtorch_cpu.so => not found 2025-05-07T20:03:58.9063714Z libtorch_cuda.so => not found 2025-05-07T20:03:58.9064006Z libtorch.so => not found 2025-05-07T20:03:58.9064324Z libgomp.so.1 => /lib64/libgomp.so.1 (0x00007fa4a8f86000) 2025-05-07T20:03:58.9064742Z librt.so.1 => /lib64/librt.so.1 (0x00007fa4a8f81000) 2025-05-07T20:03:58.9065163Z libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fa4a8f7c000) 2025-05-07T20:03:58.9065466Z 2025-05-07T20:03:58.9065580Z [CHECK] Displaying ELF information: 2025-05-07T20:03:58.9065973Z + readelf -d ./_skbuild/linux-x86_64-3.13/cmake-build/fbgemm.so 2025-05-07T20:03:58.9066259Z 2025-05-07T20:03:58.9080913Z 2025-05-07T20:03:58.9081307Z Dynamic section at offset 0x54b548 contains 37 entries: 2025-05-07T20:03:58.9081715Z Tag Type Name/Value 2025-05-07T20:03:58.9082182Z 0x0000000000000001 (NEEDED) Shared library: [libc10.so] 2025-05-07T20:03:58.9082712Z 0x0000000000000001 (NEEDED) Shared library: [asmjit.so] 2025-05-07T20:03:58.9083307Z 0x0000000000000001 (NEEDED) Shared library: [libtorch.so] 2025-05-07T20:03:58.9083828Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cpu.so] 2025-05-07T20:03:58.9084428Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cuda.so] 2025-05-07T20:03:58.9084955Z 0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6] 2025-05-07T20:03:58.9085480Z 0x0000000000000001 (NEEDED) Shared library: [libm.so.6] 2025-05-07T20:03:58.9086002Z 0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1] 2025-05-07T20:03:58.9086506Z 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 2025-05-07T20:03:58.9087041Z 0x0000000000000001 (NEEDED) Shared library: [ld-linux-x86-64.so.2] 2025-05-07T20:03:58.9087564Z 0x000000000000000e (SONAME) Library soname: [fbgemm.so] 2025-05-07T20:03:58.9088068Z 0x000000000000000f (RPATH) Library rpath: [$ORIGIN] 2025-05-07T20:03:58.9088480Z 0x000000000000000c (INIT) 0xfd000 2025-05-07T20:03:58.9088896Z 0x000000000000000d (FINI) 0x4bfc58 2025-05-07T20:03:58.9089257Z 0x0000000000000019 (INIT_ARRAY) 0x548040 2025-05-07T20:03:58.9089617Z 0x000000000000001b (INIT_ARRAYSZ) 1224 (bytes) 2025-05-07T20:03:58.9089997Z 0x000000000000001a (FINI_ARRAY) 0x548508 2025-05-07T20:03:58.9090347Z 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) 2025-05-07T20:03:58.9090711Z 0x000000006ffffef5 (GNU_HASH) 0x238 2025-05-07T20:03:58.9091045Z 0x0000000000000005 (STRTAB) 0x24d98 2025-05-07T20:03:58.9091388Z 0x0000000000000006 (SYMTAB) 0x7d58 2025-05-07T20:03:58.9105419Z 0x000000000000000a (STRSZ) 754228 (bytes) 2025-05-07T20:03:58.9105809Z 0x000000000000000b (SYMENT) 24 (bytes) 2025-05-07T20:03:58.9106175Z 0x0000000000000003 (PLTGOT) 0x54b7d8 2025-05-07T20:03:58.9106549Z 0x0000000000000002 (PLTRELSZ) 25992 (bytes) 2025-05-07T20:03:58.9106932Z 0x0000000000000014 (PLTREL) RELA 2025-05-07T20:03:58.9107292Z 0x0000000000000017 (JMPREL) 0xf6410 2025-05-07T20:03:58.9107635Z 0x0000000000000007 (RELA) 0xdf7f0 2025-05-07T20:03:58.9108126Z 0x0000000000000008 (RELASZ) 93216 (bytes) 2025-05-07T20:03:58.9108485Z 0x0000000000000009 (RELAENT) 24 (bytes) 2025-05-07T20:03:58.9108988Z 0x0000000000000018 (BIND_NOW) 2025-05-07T20:03:58.9109314Z 0x000000006ffffffb (FLAGS_1) Flags: NOW 2025-05-07T20:03:58.9109690Z 0x000000006ffffffe (VERNEED) 0xdf680 2025-05-07T20:03:58.9110040Z 0x000000006fffffff (VERNEEDNUM) 5 2025-05-07T20:03:58.9110368Z 0x000000006ffffff0 (VERSYM) 0xdcfcc 2025-05-07T20:03:58.9110724Z 0x000000006ffffff9 (RELACOUNT) 155 2025-05-07T20:03:58.9111035Z 0x0000000000000000 (NULL) 0x0 2025-05-07T20:03:58.9111284Z 2025-05-07T20:03:58.9111400Z ################################################################################ 2025-05-07T20:03:58.9111628Z 2025-05-07T20:03:58.9111635Z 2025-05-07T20:03:58.9111851Z [CHECK] Verifying sample subset of symbols in the built libraries ... 2025-05-07T20:03:58.9406946Z [CHECK] Found symbol in ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so: fbgemm_gpu::per_tensor_quantize_i8 2025-05-07T20:03:58.9409355Z ################################################################################ 2025-05-07T20:03:58.9410955Z [BUILD] Wheel Audit: dist/fbgemm_gpu_genai_nightly-2025.5.7-cp313-cp313-manylinux_2_28_x86_64.whl 2025-05-07T20:03:58.9412231Z 2025-05-07T20:03:58.9414029Z + conda run --no-capture-output -n build_binary auditwheel show dist/fbgemm_gpu_genai_nightly-2025.5.7-cp313-cp313-manylinux_2_28_x86_64.whl 2025-05-07T20:03:58.9414631Z 2025-05-07T20:04:02.6669603Z 2025-05-07T20:04:02.6670187Z fbgemm_gpu_genai_nightly-2025.5.7-cp313-cp313-manylinux_2_28_x86_64.whl 2025-05-07T20:04:02.6670787Z is consistent with the following platform tag: "linux_x86_64". 2025-05-07T20:04:02.6671114Z 2025-05-07T20:04:02.6671282Z The wheel references external versioned symbols in these 2025-05-07T20:04:02.6671953Z system-provided shared libraries: librt.so.1 with versions 2025-05-07T20:04:02.6672375Z {'GLIBC_2.2.5'}, libgcc_s.so.1 with versions {'GCC_3.0', 2025-05-07T20:04:02.6672807Z 'GCC_12.0.0'}, libstdc++.so.6 with versions {'CXXABI_1.3.11', 2025-05-07T20:04:02.6673239Z 'GLIBCXX_3.4.14', 'CXXABI_1.3.5', 'CXXABI_1.3', 'GLIBCXX_3.4.9', 2025-05-07T20:04:02.6673723Z 'GLIBCXX_3.4.18', 'GLIBCXX_3.4.29', 'GLIBCXX_3.4.11', 'CXXABI_1.3.3', 2025-05-07T20:04:02.6674224Z 'GLIBCXX_3.4.21', 'GLIBCXX_3.4', 'GLIBCXX_3.4.15', 'CXXABI_1.3.7'}, 2025-05-07T20:04:02.6674866Z libc.so.6 with versions {'GLIBC_2.17', 'GLIBC_2.14', 'GLIBC_2.3', 2025-05-07T20:04:02.6675319Z 'GLIBC_2.2.5', 'GLIBC_2.3.3', 'GLIBC_2.3.2', 'GLIBC_2.6'}, 2025-05-07T20:04:02.6675743Z libpthread.so.0 with versions {'GLIBC_2.3.4', 'GLIBC_2.2.5'}, 2025-05-07T20:04:02.6676246Z libm.so.6 with versions {'GLIBC_2.2.5'}, libcudart.so.12 with versions 2025-05-07T20:04:02.6676845Z {'libcudart.so.12'}, libdl.so.2 with versions {'GLIBC_2.3.4', 2025-05-07T20:04:02.6677214Z 'GLIBC_2.2.5'} 2025-05-07T20:04:02.6677346Z 2025-05-07T20:04:02.6677575Z This constrains the platform tag to "manylinux_2_35_x86_64". In order 2025-05-07T20:04:02.6678089Z to achieve a more compatible tag, you would need to recompile a new 2025-05-07T20:04:02.6678579Z wheel from source on a system with earlier versions of these 2025-05-07T20:04:02.6678987Z libraries, such as a recent manylinux image. 2025-05-07T20:04:02.7553431Z 2025-05-07T20:04:02.7553521Z 2025-05-07T20:04:02.7554099Z ################################################################################ 2025-05-07T20:04:02.7555101Z [BUILD] Enumerating the built wheels ... 2025-05-07T20:04:02.7555594Z + ls -lth dist/fbgemm_gpu_genai_nightly-2025.5.7-cp313-cp313-manylinux_2_28_x86_64.whl 2025-05-07T20:04:02.7555992Z 2025-05-07T20:04:02.7625397Z -rw-r--r--. 1 root root 19M May 7 20:03 dist/fbgemm_gpu_genai_nightly-2025.5.7-cp313-cp313-manylinux_2_28_x86_64.whl 2025-05-07T20:04:02.7627608Z 2025-05-07T20:04:02.7628457Z [BUILD] Enumerating the wheel SHAs ... 2025-05-07T20:04:02.7637553Z + sha1sum dist/fbgemm_gpu_genai_nightly-2025.5.7-cp313-cp313-manylinux_2_28_x86_64.whl 2025-05-07T20:04:02.7638016Z 2025-05-07T20:04:02.8009196Z e6e36b113f85d3aaa465a028688a068480db398f dist/fbgemm_gpu_genai_nightly-2025.5.7-cp313-cp313-manylinux_2_28_x86_64.whl 2025-05-07T20:04:02.8010882Z 2025-05-07T20:04:02.8015785Z + sha256sum dist/fbgemm_gpu_genai_nightly-2025.5.7-cp313-cp313-manylinux_2_28_x86_64.whl 2025-05-07T20:04:02.8016992Z 2025-05-07T20:04:02.8837048Z ad0b4412d9939ed191fe39ed235330a3031fd537afb4dd426cb9ce0834b66e07 dist/fbgemm_gpu_genai_nightly-2025.5.7-cp313-cp313-manylinux_2_28_x86_64.whl 2025-05-07T20:04:02.8837778Z 2025-05-07T20:04:02.8845409Z + md5sum dist/fbgemm_gpu_genai_nightly-2025.5.7-cp313-cp313-manylinux_2_28_x86_64.whl 2025-05-07T20:04:02.8846623Z 2025-05-07T20:04:02.9176144Z 74a01928743b9ea024408833cc9e2c10 dist/fbgemm_gpu_genai_nightly-2025.5.7-cp313-cp313-manylinux_2_28_x86_64.whl 2025-05-07T20:04:02.9177724Z 2025-05-07T20:04:02.9178220Z [BUILD] FBGEMM-GPU build + package completed 2025-05-07T20:04:03.0980839Z ##[group]Run actions/upload-artifact@v4 2025-05-07T20:04:03.0981217Z with: 2025-05-07T20:04:03.0981472Z name: fbgemm_genai_x86_clang_py3.13_cu12.8.0.whl 2025-05-07T20:04:03.0981845Z path: fbgemm_gpu/dist/*.whl 2025-05-07T20:04:03.0982151Z if-no-files-found: error 2025-05-07T20:04:03.0982420Z compression-level: 6 2025-05-07T20:04:03.0982692Z overwrite: false 2025-05-07T20:04:03.0982946Z include-hidden-files: false 2025-05-07T20:04:03.0983247Z env: 2025-05-07T20:04:03.0983472Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T20:04:03.0983804Z BUILD_ENV: build_binary 2025-05-07T20:04:03.0984058Z BUILD_TARGET: genai 2025-05-07T20:04:03.0984314Z BUILD_VARIANT: cuda 2025-05-07T20:04:03.0984562Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T20:04:03.0984833Z ##[endgroup] 2025-05-07T20:04:03.0988774Z ##[command]/usr/bin/docker exec 619e625bd71e3b5994636768304a866b1a468a4f4ab5c062f71de4323d5ef686 sh -c "cat /etc/*release | grep ^ID" 2025-05-07T20:04:04.3416419Z With the provided path, there will be 1 file uploaded 2025-05-07T20:04:04.3417991Z Artifact name is valid! 2025-05-07T20:04:04.3418742Z Root directory input is valid! 2025-05-07T20:04:04.4696658Z Beginning upload of artifact content to blob storage 2025-05-07T20:04:05.3860157Z Uploaded bytes 8388608 2025-05-07T20:04:05.5889200Z Uploaded bytes 16777216 2025-05-07T20:04:05.6723924Z Uploaded bytes 18517235 2025-05-07T20:04:05.6891422Z Finished uploading artifact content to blob storage! 2025-05-07T20:04:05.6892188Z SHA256 digest of uploaded artifact zip is 2c430e283306050771ed0148f8bc0ff9c88d696c9122c4b4956d4418e1e568bd 2025-05-07T20:04:05.6892809Z Finalizing artifact upload 2025-05-07T20:04:05.7618228Z Artifact fbgemm_genai_x86_clang_py3.13_cu12.8.0.whl.zip successfully finalized. Artifact ID 3081408483 2025-05-07T20:04:05.7619175Z Artifact fbgemm_genai_x86_clang_py3.13_cu12.8.0.whl has been successfully uploaded! Final size is 18517235 bytes. Artifact ID is 3081408483 2025-05-07T20:04:05.7628507Z Artifact download URL: https://github.com/pytorch/FBGEMM/actions/runs/14891846252/artifacts/3081408483 2025-05-07T20:04:05.7966213Z Post job cleanup. 2025-05-07T20:04:05.7971707Z ##[command]/usr/bin/docker exec 619e625bd71e3b5994636768304a866b1a468a4f4ab5c062f71de4323d5ef686 sh -c "cat /etc/*release | grep ^ID" 2025-05-07T20:04:06.1034173Z [command]/usr/bin/git version 2025-05-07T20:04:06.1389402Z git version 2.47.1 2025-05-07T20:04:06.1437289Z Copying '/github/home/.gitconfig' to '/__w/_temp/8a775a52-4864-4f3d-969d-1b8ed192f340/.gitconfig' 2025-05-07T20:04:06.1456616Z Temporarily overriding HOME='/__w/_temp/8a775a52-4864-4f3d-969d-1b8ed192f340' before making global git config changes 2025-05-07T20:04:06.1457460Z Adding repository directory to the temporary git global config as a safe directory 2025-05-07T20:04:06.1458153Z [command]/usr/bin/git config --global --add safe.directory /__w/FBGEMM/FBGEMM 2025-05-07T20:04:06.1526239Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-05-07T20:04:06.1554995Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-05-07T20:04:06.2182219Z Entering 'external/asmjit' 2025-05-07T20:04:06.2327989Z Entering 'external/composable_kernel' 2025-05-07T20:04:06.2498135Z Entering 'external/cpuinfo' 2025-05-07T20:04:06.2612512Z Entering 'external/cutlass' 2025-05-07T20:04:06.2799172Z Entering 'external/googletest' 2025-05-07T20:04:06.2928998Z Entering 'external/hipify_torch' 2025-05-07T20:04:06.3062462Z Entering 'external/json' 2025-05-07T20:04:06.3160832Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-05-07T20:04:06.3184309Z http.https://github.com/.extraheader 2025-05-07T20:04:06.3192153Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2025-05-07T20:04:06.3229196Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-05-07T20:04:06.3503334Z Entering 'external/asmjit' 2025-05-07T20:04:06.3548583Z http.https://github.com/.extraheader 2025-05-07T20:04:06.3586296Z Entering 'external/composable_kernel' 2025-05-07T20:04:06.3620209Z http.https://github.com/.extraheader 2025-05-07T20:04:06.3677249Z Entering 'external/cpuinfo' 2025-05-07T20:04:06.3709419Z http.https://github.com/.extraheader 2025-05-07T20:04:06.3744298Z Entering 'external/cutlass' 2025-05-07T20:04:06.3778147Z http.https://github.com/.extraheader 2025-05-07T20:04:06.3834811Z Entering 'external/googletest' 2025-05-07T20:04:06.3869439Z http.https://github.com/.extraheader 2025-05-07T20:04:06.3898417Z Entering 'external/hipify_torch' 2025-05-07T20:04:06.3933601Z http.https://github.com/.extraheader 2025-05-07T20:04:06.3962392Z Entering 'external/json' 2025-05-07T20:04:06.3996273Z http.https://github.com/.extraheader 2025-05-07T20:04:06.4185641Z Stop and remove container: 0703a8b60f604284bbb3ddbf668bc6f3_amazonlinux2023_76f532 2025-05-07T20:04:06.4190525Z ##[command]/usr/bin/docker rm --force 619e625bd71e3b5994636768304a866b1a468a4f4ab5c062f71de4323d5ef686 2025-05-07T20:04:07.6292454Z 619e625bd71e3b5994636768304a866b1a468a4f4ab5c062f71de4323d5ef686 2025-05-07T20:04:07.6326829Z Remove container network: github_network_994bcd5a2cdf4c50adc703a7ba6189a2 2025-05-07T20:04:07.6331529Z ##[command]/usr/bin/docker network rm github_network_994bcd5a2cdf4c50adc703a7ba6189a2 2025-05-07T20:04:08.6379188Z github_network_994bcd5a2cdf4c50adc703a7ba6189a2 2025-05-07T20:04:08.6418892Z A job completed hook has been configured by the self-hosted runner administrator 2025-05-07T20:04:08.6634656Z ##[group]Run '/home/ec2-user/runner-scripts/after_job.sh' 2025-05-07T20:04:08.6640179Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-05-07T20:04:08.6640610Z ##[endgroup] 2025-05-07T20:04:20.7187259Z Cleaning up orphan processes